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ABSTRACT 



Utilization of the card catalog in the main library 
(Sterling Memorial Library) of Yale University was studied over a 
period of more than a year. Traffic flow in the catalog was observed, 
and was used as the basis for scheduling interviews with a 
representative sample of catalog users at the moment of catalog use. 
More tnan 2000 interviews were completed. Data were collected on user 
objectives and starting clues. Follow-up studies were done on the 
matches among user clues, catalog card data, and information 
available in the front matter of cataloged documents. Reasons for 
search failures were determined. In terms of immediate intent, 73 
percent of searches are document ("known item") searches and 16 
percent are subject searches; in terms of underlying interest, 56 
percent are document searcnes and 33 percent are subject searches. 
Remaining searches are 6 percent author searches (to find out what is 
on hand from a known author or institution) and 5 percent 
bibliographic searches (to complete or verify a reference on the 
basis of catalog-card data). The importance of secondary search clues 
in achieving retrieval despite incomplete or inaccurate primary clues 
is discussed. (Author) 
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Summary 



Obje ctives and Methods 

A study was conducted of the utilization of the card catalog in 
the main library (Sterling Memorial Library) of Yale University. The 
study was motivated by interest in both short-term and long-term 
improvement in catalog performance. 

The pattern of traffic flow in the catalog area was determined by 
means of frequent traffic counts that were continued for more than a 
year. Observed traffic was used as the basis for designing an interview 
schedule that would encompass a thoroughly representative sample of catalog 
users. More than two thousand catalog users were interviewed during a full 
calendar year. Users were approached and interviewed at the moment of 
initiating a catalog search. Interview techniques were designed to bring 
out many details of search objectives and starting clues with a minimum 
of probing by the interviewer. Information on the user’s academic status 
and experience was collected also. Refusal of interviews was less than 
1 percent. 

After completion of their catalog searches, users were again approached 
and queried regarding the results. Call numbers of identified documents 

i 

were recorded. Catalog cards for these documents and front matter from 
these documents were copied and compared with the users 1 starting clues. 

A number of miscellaneous studies were made, including an investigation 
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of the causes of failure in unsuccessful catalog searches. 

This study produced very voluminous records. Much of the analysis 
of results accomplished to date was facilitated by the use of computer 
programs for covariance analysis. There is ample opportunity for further 
analysis of available data. 

Results 

The findings of this study are listed below. Observations regarding 
gross traffic patterns in the catalog were an essential part of the study 
and, logically, should precede observations on the content of catalog 
searches; however, they provide few real surprises and are therefore 
listed last. 

1. At the instant of approach to the catalog, 73 percent of the users 
are * erupting a search for a particular document (known item); 

16 percent are attempting a subject search; 6 percent are attempting 
an author search (to find out what documents are on hand from a known 
author, publisher’s series, or other source); and 5 percent are 
attempting a bibliographic search (to use the information provided 
by the catalog card for some document x^ithout any intention of 
locating or borrowing the document). 

2. Many users attempt document searches only as a means of locating 
some subject information probably contained by the documents. In 
terms of underlying objectives, only 56 percent of the searches are 
for the document as an end in itself; 33 percent are subject searches 
(16 percent directly by subject and 17 percent indirectly by a known 



document) . 



3. No significant variations in search objectives were detected with 
respect to time of year or type of catalog user. 

4, Objects of attempted document searches were 80 percent monographs 
and 20 percent periodical articles. 

3. Twenty-six percent of the catalog users are already familiar with 
desired documents. Probability of previous contact with a document 
tends to increase with years of library use by the individual 
involved . 

6, The success or failure of a catalog search is determined right at 
the catalog in all bibliographic searches and in 98 percent of the 
document searches „ In 40 percent of the subject searches and 30 
percent of the author searches, the catalog user must go elsev7here 
(e.g., to the stack to look through possibly pertinent books identified 
at the catalog) in order to determine the success or failure of the 
catalog search. In general, 91 percent of catalog searches are evaluated 
at the catalog, 

7. Eighty-four percent of document searches succeeded in locating the 
desired item and its call number. The success rate for author and 
subject searches appears to be the same or nearly the same, 

8, No evidence of frustration or diminishing catalog use was found 
among catalog users in their first year of experience with the 
Yale libraries. Success rates appear to be about the same for all 
types of users. 

9. The principal approaches by which users attempt to enter the catalog 

for document searches are: author, 62 percent; title,, 28.5 percent; 
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subject, 4,5 percent; and editor, 4 percent. Inexperienced users 
sometimes search by author or title of an analytic (0,5 percent) 
but soon abandon this unproductive approach. 

10. Users in their first and second years of experience at Yale altogether 
account for 55 percent of catalog use. Each of these two year groups 
has a different, somewhat atypical, distribution of approaches to 

the catalog which is net evident from the third year onward. 

11. Of the 16 document searches that failed out of every 100 attempted, 

10 failed simply because the documents were not in the catalog 
(one-fifth of these were added to the catalog between the time of 

the original search and the time of the follow-up study a few months 
later) ; 5 were in the catalog and could have been located by the user 
with his starting clues; 1 could not be traced because of inadequate 
user clues. Thus, the potential for improvement of catalog service 
through expansion (especially timely expansion) of the collection 
and through better orientation of catalog users is far greater than 
the potential for improvement through expansion of catalog accessibility. 

12. There is no general agreement among catalog users regarding a best 
approach to improvement of the catalog; rather, there is diffuse 
interest in ail possible approaches to improvement. 

13. Title information and author information predominate over other types 
of search clues with respect to both availability and accuracy. 

14. The availability and accuracy of search clues tend generally to 
favor the title approach over the author approach. But the difference 
in usefulness between the two approaches does not appear to be large. 



0 




4 



Circumstances of catalog selectivity under particular entry terms 
can heavily favor the author approach for some searches and the title 
approach for others. 

15. Among the accurate or possibly accurate clues known by catalog users 
that are not accessible under present cataloging practices, title-like 
clues (subtitles, short titles, analytic titles, etc.) are more common 
than author-like clues (editors, compilers, etc.). 

16. Catalog users were usually able to identify desired documents in their 
searches despite incomplete or misspelled starting clues. Neither 

of two computer retrieval algorithms tested on data from this study 
could approximate human performance in overcoming inadequacies of 
search clues. 

17. The use of combinations of computerized approaches shows more promise 
for effective retrieval than any single approach. However, devices 
will be required to suppress false drops, and this will probably 
necessitate retention of a variety of data elements in the computer 
store, even for simple document searches. Publication date is a 
prime choice for inclusion in such a store. Subject clues can also 
be of great value in document searches. 

18. Subject searches in a catalog frequently identify far more potentially 
useful documents than a user can profitably examine. Non-subject 

data on the catalog cards can help the user to select the most promising 
documents and to reject the least promising. Virtually every type of 
data element provided on a card can help to narrow the selection, but 
relative usefulness of the different elements varies with users and 
with individual searches. Filing by publication date within a subject 

O 
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heading would probably be of more general value than the current 



practice of filing by main entry. 

19. Transcription of descriptive data from the title pages and other 
front matter of documents without creative human input could 
conceivably build a catalog that is adequate for document searches, 
but it would lack important conveniences of conventional catalogs. 

20. Transcription of subject information from titles, contents, indexes, 
etc., of documents without human intervention could not possibly lead 
to a catalog suitable for subject searching unless a device was 
somehow incorporated for detecting and retrieving synonyms. Files 
built up by such transcription might tend to be excessively large and 
to be diluted with trivia not easily distinguished from the major 
subjects covered by the cataloged documents. 

21. As expected, catalog use follows a consistent year-to-year traffic 
pattern . 

22. Intensity of catalog use is twice as heavy during the academic year 
as during the summer period, but there are no distinct seasonal 
variations within these two periods. Holidays and recesses during 
the academic year are characterized by spikes of lower-than-normal 
catalog use followed by higher-than-normal use. 

23. Catalog use and book borrowing parallel each other in an almost 
constant ratio from week to week. Measurement of catalog use can 
serve as a predictor of borrowing, or vice versa. 

24. Patterns of variation of catalog use by day of the week, hour of the 
day, and fraction of the hour have been detected and described. 

O 




Knowledge of these patterns can be applied in planning physical 
facilities and in scheduling reference assistance at the catalog. 

25. There is a strong tendency for catalog use to occur immediately 
after a user’s entrance to the library, as one might expect. 

26. Users of the catalog appear to be divided into two different but 
roughly equal populations on the basis of their length of stay at 
the catalog. The distribution patterns of the two groups exhibit 
modes of about 2 minutes and about 6 or 7 minutes, respectively. 

27. Catalog users respond readily to interviews regarding their intended 
catalog searches. Refusal rates in this type of study are almost 
negligible . 

28. As a group, graduate students are the heaviest users of the 
catalog, followed closely by undergraduates. Total faculty use is 
light by comparison. One fifth of the total catalog use comes from 
persons not directly associated with the university. 

29. Per capita use of the catalog is somewhat higher for upperclassmen 
than it is for graduate students. Per capita use by freshmen and by 
faculty are equal and are at a level about half that of upperclassmen. 
(This does not take into account faculty use. of departmental libraries.) 

30. Catalog use throughout the year is relatively constant for faculty. 

Use by graduate students declines in the summer to half the academic 
year level; use by undergraduates drops to about one sixth during 
the summer. Use by visitors doubles during the summer. 

The interpretation of these results can vary greatly, depending on 

whether a librarian is more interested in expanding service or in conserving 



money and labor. Some of the apparent implications for expansion and 

for retrenchment are discussed. Improved user orientation is a particularly 

attractive approach to improved catalog service. 
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Introduction 



The work reported here is a study of the utilization of the card 
catalog of a very large library, specifically the principal catalog of 
the library system of Yale University. 

The study was motivated by two basic concerns, one of them of a 
long-term, or exploratory, nature, the other of a short-term, or 
operationally supportive, nature. The long-term concern is the question 
of how to design a computerized catalog for a very large library that 
can be expected to give the best possible performance. The short-term 
concern is the question of whether, and, if so, how, existing card catalogs 
in very large libraries may be made more responsive to user requirements. 

It was recognized that a carefully designed study of actual utilization 
of a catalog of a large library could shed useful light in both areas of 
cc icern. 

The connection between the research design and these basic concerns 
is very straightforward. One cannot create an ideal tool of any sort on 
a rational basis (whether that tool be a conventional catalog, a computer- 
ized catalog, or any other device for any other application) without knowing 
a good deal about the purpose or purposes for which the tool is to be used, 
and about the manner in which the users interact with the tool. In the 
literature on libraries, there is a dearth of reliable information on the 
utilization of catalogs. This study was undertaken in an attempt to fill 
the void, at least to a degree. 
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In the preceding paragraph., there was no intention of implying 
that there have been no published studies of catalog utilization. In 
fact there have been many. The point is that their results have not 
been reliable . Most past studies have been unreliable because they 
were much too small, involving very small samples of actual catalog use. 
Usually in these reports the method of selecting a sample of catalog 
users is either unstated or clearly of such a nature as to invite slanting 
of results. And^ in almost all instances, the method of data collection 
is suspect, making use of interviews or questionnaires administered after 
(sometimes long after) the instances of catalog use under investigation, 
and therefore inviting gross errors due to faulty human memory. 

The intent of the study reported here was to circumvent the several 
shortcomings which made earlier studies unreliable. The study was designed 
to sample a significant fraction of actual catalog use (approximately one 
percent) over a significant period of time (a full year). The selection 
of the sample of catalog users to be studied was made as representative as 
possible by basing it solely on observed volume of traffic in the catalog 
during different times of the day, days of the week, and seasons of the 
year. Information on needs and approaches was gathered from catalog users 
immediately preceding a catalog search, rather than only after the search. 
The clues with which users began their searches were later compared with 
the search results, with the approaches afforded by the catalog, and with 
alternative search approaches that could conceivably be provided by making 
specific changes in cataloging rules that would take advantage of automatic 
indexing capabilities of computers. 

The research proposal to the Office of Education was approved in July 
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1967. Work began in late September 1967. Systematic measurement of traffic 
through the catalog was begun in late November 1967 and was continued j with 
one brief interruption, through early February 1969. The gathering of usable 
data from interviews began in March 1968 and concluded in April 1969. 
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Methods 



Environment 

The catalog whose utilization was investigated in this study was 
the public catalog in the Sterling Memorial Library of Yale University. 

The total library collection of Yale University includes more than 5 
million volumes and is divided into some sixty or so units housed in 
various buildings throughout the campus. Sterling Memorial Library 
functions as the main library for Yale University, and it houses the 
largest por.ion of the total Yale collection — approximately 3.5 million 
volumes. The largest of the other library units at Yale (none of them 
housing as much as a half-million volumes) are: divinity, law, medicine, 

science, rare books, art and architecture, and music. The public catalog 
at Sterling Memorial Library provides access to all of the holdings in 
the total Yale collection. It contains full sets of catalog cards for 
books shelved at Sterling Memorial Library; and it contains only main- 
entry cards for books shelved elsewhere at Yale. 

Physically, the public catalog of Sterling Memorial Library is 
located near the front entrance of the building, in a rectangular area 
approximately 60 feet by 40 feet, immediately to the side of the principal 
thoroughfare leading to the circulation desk, stack y main reading room, 
reference area, periodical reading room, special collections * offices, etc. 
Four aisles of the catalog open from the long side of the catalog area onto 
this thoroughfare. A fifth aisle opens from the short side onto a spur 
corridor which leads from this thoroughfare only to the main reading room. 
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The only significant portions of Sterling Memorial Library which may be 
reached from the main entrance without passing by the public catalog are 
the undergraduate reserve-book room, a browsing room, and some lavatory 
facilities. However, there is a second entrance to the building which is 
normally open on weekdays until 6 P.M. and which is used a good deal; from 
this alternate entrance, it is possible to reach all portions oi the building 
except those just mentioned without passing by the catalog. Thus, it was 
not safe to assume that a visitor to the building who has a problem that 
might warrant a catalog search will actually use the catalog, as a matter 
of convenience, before going elsewhere in the building. Neither was it 
safe to assume that the catalog users who enter the catalog through a 
particular aisle are representative of the users who enter through the 
other four aisles that are available, since users coming from the front 
entrance might tend to favor one aisle, users coming from the main reading 
room might*, tend to favor a different aisle, etc., and each group could 
conceivably have significant differences in their requirements. 

The catalog is housed in cabinets that are 14 drawers high. Subject 
entries and name and title entries are all interfiled in a single alpha- 
betic sequence. Subject headings are based on the Library of Congress 
arrangement, differing in only minor respects. Contents of the cards are 
very similar, in data elements and arrangement, to contents of Library of 
Congress catalog cards; indeed, a large fraction of the cards in the catalog 
are modified prints of Library of Congress cards. The catalog is estimated 
to contain approximately 8 million cards. These were housed in about 6,000 
card drawers at the beginning of the study; about half-way through the data- 
gathering period, the catalog was expanded into 7,000 drawers by adding a 
bank of cabinets in the center of the rectangular catalog area. 
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It should be noted that Sterling Memorial Library is an open-stack 
library. Yale students, faculty, staff, and many outside users holding 
authorization cards, all have the privilege of entering the stack to 
browse and to remove books for borrowing. A page service is available 
to all users during normal working hours. It requires the prior filling 

out of a loan form, including the call number of the desired item. This 

service is heavily patronized by many users who do not wish to enter the 
stack; it is the only option available to the non-Yale visitor who lacks 

stack privileges and wishes to consult a book housed in the stack. 

Traffic Measurements 

The determination of the pattern of people entering the catalog area 
was a key factor in the later design of an interviewing schedule which 
would yield a clearly representative sample of catalog users. The pattern 
of entry to the catalog was determined by having observers assigned to 
count the number of people entering the catalog area through different 
entryways during different times of day and days of the week. Observers 
were stationed where they could observe simultaneously either the front 
three aisles into the catalog or the rear two aisles into the catalog. 

For a period of five minutes duration, they would count the number of 
persons entering each of the aisles being observed. Timing periods were 
rigidly predetermined to cover different hours of the day, different days 
of the week, and even different tenths of each hour. Observation assign- 
ments were rigidly scheduled; the schedule repeated every seven weeks. 
Observations were continued over a total of 62 weeks so as to provide a 
10 week overlap period for determination of any annual variation in 
traffic which might occur. (During this 62 week period, there was a 
5-week interruption in observations f during the late summer, while shifting 
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of catalog drawers was going on; the abnormal shifting activity tended to 
interfere with traffic flow.) 

The total amount of time during wnich traffic was counted was somewhat 
over 4 percent of the time that the library was open during the total time 
span involved. For practical reasons, the coverage was more intense during 
weekday working hours (6 percent) and lower during evening hours and weekends 
(about 2.5 percent). However, observed traffic was also lower (by about one- 
fourth) during evening hours and weekends. Tallies of traffic counts by hour, 
day, and entryway for the first 10 weeks of observation were used as the basis 
for designing the interviewing schedule. Traffic counts were continued during 
the interviewing period to check on the continuing validity of the pattern 
observed during those first 10 weeks and to provide a rational >asis for 
weighting of interview results if the interview schedule should prove to be 
biased with respect to observed traffic. 

Several other traffic measurements were made in addition to the counts 
of persons entering the catalog area: At precise preassigned times, observers 

would follow anyone entering the catalog to observe where he went (which 
catalog drawer), how long he stayed at the catalog, and how many call- 
number notations he wrote down. Intervals for conducting these observations 
were scheduled in exactly the same pattern as intervals for gross traffic 
counts, so as to cover all times of catalog availability. 

Observers of catalog traffic were instructed to avoid counting those 
library staff members who regularly work in the catalog area (filers, 
verifiers, reference librarians). The intent of the measurements was to 
count, as far as possible, only the "consumers 11 of the catalog service, 
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rather than the suppliers and interpreters. 



Interview Schedul e 

The schedule for conducting interviews with catalog users was based 
on observed traffic into the catalog during an initial 10-week observation 
period. Projection of observed traffic for this Deriod suggested that 

i 

annual traffic into the catalog would be of the order of 300,000. (Full- 
year traffic observations later showed this estimate to be low.) When one 
adjusted this count to omit individuals who were found to be entering the 
catalog merely to use it as a shortcut between the front entrance and the 
main reading room, the indicated annual total of real catalog users was 
closer to 250,000. It was decided that about 2,500 interviews or more 
(i.e., at least something approaching one percent) should be attempted. 

The interview schedule that was adopted called for an interviewer to 
be at a particular entry to the catalog area at a specified time on a 
specified day of the week. The first individual other than library staff 
to enter the catalog through that entryway during the next six minutes and 
to begin to use the catalog would be the person to be interviewed. (If 
no one entered during that interval, no one was interviewed until the 
next assigned time and place.) Interview assignments were set up on a 
revolving schedule very much like the schedule described above for traffic 
measurements. However, adjustments were made to reflect the observed 
relative traffic volume through each of the five entryways to the catalog, 
and to reflect the observed relative traffic during different hours of the 
day. Adjustments for minutes of the hour and for day of the (regular) 
week were not judged to be necessary. As with traffic measurements, the 
schedule for interviewing during the evening and weekend periods was made 



O 




16 



lighter than during the regular weekday periods; this was done with the 
knowledge that compensations could be made late: by weighting the results 
of actual evening and weekend interviews somewhat more heavily than the 
results of weekday interviews in compiling final statistics. 

Interview Method 

Interview content and technique were designed to elicit quite 
specific information from catalog users, with a minimum amount of bias 
due to prompting or leading by the interviewer. The method adopted 
made use of an interview guide in the form of a multiple-part questionnaire 
(Appendix E) which interviewers were required to follow uniformly. Inter- 
views would begin with very vague, nondirective questions (’'Please tell 
me precisely what you were about to do at the catalog the moment I inter- 
rupted you.”), in order to give the user full opportunity to state what- 
ever he happened to regard as important or significant. Only as the 
interview progressed would the questions become more specific, so as to 
fill in details which the user had not already supplied but which were 
regarded a_ priori as important to the study. 

The underlying pattern of the interview involved identifying rather 
quickly the basic type of search which the user was about to make in the 
catalog (e.g., a search for the purpose of borrowing a specific known 
document; a search for the identity of documents on a specific subject; 
a search for the identity of documents from a specific source, as a 
particular author or a particular organization; a search for descriptive 
bibliographic information regarding a known document without any intent 
of borrowing the actual document). Identification of this basic type of 
search would then determine which of several possible lines of questioning 
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to follow in the remainder of the interview. 



When it appeared that no more useful information could be gathered 
regarding the immediate search being conducted by the user, the inter- 
view would be terminated with a series of questions on the user and his 
personal background (but not his name). Background questions related to 
the user f s status at Yale, his field of specialization, the length of his 
residence in the Yale community, and the general level of his use of 
Sterling Memorial Library and other libraries at Yale. 

Questions asked during the main portion of the interview were intended 
to bring out everything of possible retrieval value that the user knew 
about the material he desired at the time of starting his search. This 
would include, as appropriate, the type of document (whether an ordinary 
book, or a series, periodical, report, etc.), descriptive data (author, 
title, date, publisher, etc.), physical characteristics of a document 
(size, color), contents (index, illustrations, bibliography), subject 
terms, translation specification, edition specification, and so forth. 

The questions also established whether or rot the user was already 
familiar with the material he wanted, how he had first learned of the 
existence of the material, the connection in which he wanted to make 
use of the material, and the particular clue which he intended to use 
to begin his search the catalog. Particular pains wer* taken to 
record descriptive data elements exactly as they were known to the user, 
taking nothing for granted: If the data came only from his memory, he 

was asked to spell out the authors and the longer title words; if the 
data came from class notes or duplicated lists which he had brought to 
the library, these were photocopied by the interviewer. 
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At the conclusion of the interview, the user was left alone to 
complete his catalog search. However, he was observed discreetly from 
a distance. The amount of time spent at the catalog and the number of 
catalog drawers searched were noted on the interview record. As the 
user was leaving the catalog area, he was stopped again and asked 
whether his search had been successful. If the answer was affirmative, 
he was asked to let the interviewer copy any call numbers that he had 
found in the catalog that satisfied his search needs. Users who were 
not certain whether their searches had been successful but who were 
going elsewhere in the library to find out (usually these were people 
who had identified a potentially useful stack area by finding some 
representative class numbers in tha catalog and who intended to browse 
the stack for known and/or unknown documents) were given a self-mailing 
follow-up form on which they could conveniently note any call numbers 
that were subsequently found to satisfy their needs. 

Several months were spent in developing and testing the interview 
outline and technique before starting the full year’s run of data 
collection for the project. Only very minor changes were made as the 
year progressed. Five individuals performed practically all of the 
interviews. A comparison of the results of interviews conducted by 
different interviewers was made about four months after the start of 
the interviewing year; no serious biasing of results could be associated 
with the interviewers compared. Therefore, the interviewing technique 
was judged to be quite objective, as had been hoped. 

The interviewing schedule that was adopted provided for a maximum 
of some 2700 interviews during the full year studied. Because of 
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various random factors (e.g., no user at the catalog at the scheduled time 
and place, unexpected library closings, or illness of the interviewer), 
the number of interviews actually completed in the year was 2,134. It is 
interesting to note that fewer than 1 percent of the catalog users who were 
approached refused to grant an interview (and usually only because of hurry 
to get to a class); most of the users interviewed were extremely pleased 
to learn that some people were really interested in their library needs, 
and they answered questions without reluctance. During the interview year 
it was inevitable that some individuals would be interviewed more than once, 
purely by chance; such instances were identified by one of the routine 
interview questions and noted in the interview records. 

A large multi-library survey published in 1938 (1) used the technique 
of accompanying catalog users through their searches. The study reported here 
uses the technique of interviewing catalog users before the start of a search 
and later ascertaining the results of the search. An earlier independent 
application of this method occurred in an unpublished thesis project at the 
University of Chicago Harper Library (2); in that project, 100 searches for 
particular documents ("known item" searches) were studied. More recently, a 
similar but much larger study of "known item" searches at both public and 
university libraries in Ann Arbor, Michigan, was reported (3) . 

Catalog Card Follow-Up 

Almost all of the catalog searches which users regarded as successful were 
searches which resulted in the identification of documents or catalog cards 
bearing sepcific call numbers. By looking up each call number in the shelf- 
list card file, it was comparatively simple for the project staff to obtain a 
photocopy of the basic catalog card for each item associated with a successful 
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catalog search. These photocopies were attached to the respective 
interview records for later use in comparing search clues brought to 
the catalog by the user with data and access points available in the 
catalog. 

Book "Front Matter” F ol low-Up 

Follow-up on call numbers identified in successful searches extended 
not merely to catalog cards but also to the actual books which the call 
numbers represented. After allowing a period of several weeks, at least, 
for the user to finish with the items he identified, these books would 
be borrowed from the library shelves (or recalled) and examined by the 
project staff. Certain non-central portions of these books were photo- 
copied when present and not redundant, including (but not necessarily 
in all cases) cover, title page, verso of title page, table of contents, 
preface, brief introduction, and index. This photocopied material was 
also attached to the respective interview records, to be used later for 
comparing search clues brought to the catalog by the user with potentially 
matching data elements that are conveniently available to a cataloger 
(or to a hypothetical optical-scanning device that could conceivably 
be substituted for a human cataloger in the fanciful future) . 

Miscellaneous Measurements 



A number of miscellaneous measurements were made which are related 
to the understanding of the needs of catalog users. 

In the case of known-document searches v/hich failed, a second 
catalog search attempt was made by the project staff, using the clues 
supplied by the user before his unsuccessful attempt. In a number of 
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cases, the item wanted was clearly identified in this second search, 
indicating either a user error or inadequate familiarity with the 
catalog arrangement. 

Statistics were compiled on the distribution of catalog entries 
in the Sterling Memorial Library public catalog in terms of the first 
two letters of the file term. Similar statistics were compiled for 
the first two letters of call numbers of items identified in successful 
catalog searches. Another compilation was made for the first three 
letters of catalog headings under which items were located. Statistics 
were compiled on observed use of individual catalog drawers. 

Two different published formulas for achieving retrieval from 
computerized files despite spelling errors in the data to be searched 
were tested for effectiveness in catalog searching by applying the 
formulas to user clues and catalog-card data from successful document 
searches studied during the interviewing phase. One of the formulas 
tested attempted to negate misspellings by truncating the words to be 
matched; the other formula attempted to achieve the same result by 
applying specific rules for condensing the words to be matched. 

Data Reduction 

The records obtained from the thousands of interviews and the 
thousands of traffic measurements described above were extremely 
voluminous. Computer methods were invoked to assist in their analysis. 
Codes and formats were developed for keypunching much of the collected 
data on IBM cards. Virtually all of the data obtained from traffic 
measurement were susceptible to this treatment; data from a given 




22 



measurement instance could be recorded on a single punched card. 
Interview data, however, were much less tractable. For example, it was 
not generally feasible to keypunch every detail of information about 
a desired book supplied by the user, or available from the catalog card. 
But it was possible to keypunch indications as to whether or not 
certain types and ranges of information wer e ava il able from the user 
or the catalog card. Some aspects of accuracy and correspondence 
could also be indicated. The selection of information characteristics 
to recognize in the punched card format is largely a reflection only 
of the judgment and intuition of the research group. Many descriptive 
characteristics that might have been reduced to punched card input 
were not. However, basic records were preserved to permit further 
analysis of data by manual methods or further reduction, of data to 
machine-usable form. 

Data on punched cards were analyzed mainly by the use of table, 
or matrix, programs provided by the Yale Computer Center. With these 
programs, the computer will take data concerning any two specific 
variables in the body of data supplied to it and will print out a 
table, or matrix, showing the co-occurence of these variables in 
terms of any individual value or any specific range of values for 
each of the two variables. Totals are provided for each row of 
figures and for each column of figures. The tables or matrixes can 
be made to show either raw data for each row-column position, or 
percentages of total populations for each row or column involved. 

One can specify that one table of each kind be printed. Furthermore, 
a third variable can be brought into the program and the computer can 
be required to print out a series of subsidiary tables or matrixes, 
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each containing just that portion of the data regarding the first two 
variables that applies to a particular value, or range of values, of 
the third variable. It takes very little arithmetic to determine that 
the number of tables which could conceivably be produced from a body 
of data involving scores of variables, as in this study, is quite 
astronomical and quite beyond the budget of most research projects. 

If all possible three-variable correlations were actually printed out, 
it is doubtful that anyone would have the time and energy to study them. 
Therefore, only a limited number of the more promising tables was printed, 
mostly of the two-variable type. 

The selection of combinations of variables to be represented in 
tables or matrixes was based upon the results of still another computer 
manipulation. The randomness or nonrandomness of correlation (covariance) 
of each pair of variables in the data base was determined by a statistical 
program which printed out a short table showing the measure of degree of 
correlation for each pair. Pairs of variables showing relatively strong 
degrees of correlation were easily identified from this table; these 
were generally the variables selected for detailed elucidation by means 
of the table program described above. More than 40 variables (Appendix F) 
were compared for covariance. Over a hundred potentially interesting 
cases of covariance were identified; as yet, not all of these have been 
examined in detail. 
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Results 



Publications , Paper s 

This report is final in the sense that it terminates the support 
grant under which data were collected and processed. It does not 
nearly exhaust the possibilities for useful analysis of tne very large 
and multifaceted data collection that has been assembled. Results 
presented below represent only a first skimming of the data — an 
attempt to derive quick answers, where possible, to some of the more 
obvious and important questions about catalog use and catalog needs. 

It is hoped that the opportunity and means will be found in the future 
to make further use of the excellent data collection in order to gain 
better quantitative understanding of the questions considered in 
this report, and in order to answer many additional questions that 
have yet to be considered. 

Preceding this report, there were four publications 04-7) 
resulting from this study; they are included here as Appendixes A, 

B, C, and D, Two of these publications (4^ 5) are primarily 
descriptions of the design of the study, presenting very few and very 
preliminary findings. The other two publications (6, 7 ) present 
samplings of data on the number and types of errors found in the clues 
with which catalog users begin their searches; these data are used to 
assess the values of two different automatic searching algorithms 
which have been proposed by other authors for achieving retrieval from 
computerized bibliographic files despite inaccuracies in the data to 
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Table 1 



First 

Daz 

671127 

671204 

671211 

671218 

671225 

680101 

0108 

0115 

0122 

0129 

0205 

0212 

0219 

0226 

0304 

0311 

0318 

0325 

0401 

0408 

0415 

0422 

0429 

0506 

0513 

0520 

0527 

0603 

0601 

0617 

0624 

0701 

0708 

0715 

0722 

680729 

0805 

0812 

0819 

0826 



Traffic in Catalog, Weekly 



Library 


Open 


Hours 


Cat. Users 


Users 


Day 


Eve 


Wkcnd 


Total 


(by Extrap.) 


Per Hour 


42.5 


33.75 


18.0 


94.25 


9,867 


104.59 


42.5 


33.75 


18.0 


94.25 


10,028 


106.10 


42.5 


33.75 


3.25 


79.5 


6,923 


87.08 


41.25 






41.25 


3,028 


73.41 


33.0 






33.0 


3,491 


105.79 


33.75 


20.25 


18.0 


72.0 


7,719 


107.21 


42.5 


33.75 


18.0 


94.25 


8,722 


92.45 


42.5 


33.75 


18.0 


94.25 


7,979 


84.58 


42.5 


33.75 


18.0 


94.25 


8,112 


85.99 


42.5 


33.75 


18.0 


94.25 


9,410 


99.75 


42.5 


33.75 


18.0 


94.25 


9,908 


105.02 


42.5 


33.75 


18.0 


94.25 


9,996 


105.96 


42.5 


33.75 


18.0 


94.25 


9,312 


98.71 


42.5 


33.75 


18.0 


94.25 


8,461 


89.69 


42.5 


33.75 


18.0 


94.25 


9,636 


102.14 


42.5 


33.75 


3.25 


79.5 


7,618 


95.82 


41.25 






41.25 


4,563 


110.61 


41.25 






41.25 


4,705 


114.06 


42.5 


33.75 


18.0 


94.25 


9,249 


98.04 


34.0 


27.0 


18.0 


79.0 


6,352 


80.41 


42.5 


33.75 


18.0 


94.25 


10,049 


106.52 


42.5 


33.75 


18.0 


94.25 


9,637 


102.15 


42.5 


33.75 


18.0 


94.25 


9,056 


95.99 


42.5 


33.75 


18.0 


94.25 


9,500 


100.70 


42.5 


33.75 


18.0 


94.25 


8,159 


86.49 


42.5 


33.75 


18.0 


94.25 


7,288 


77.25 


42.5 


33.75 


13.0 


94.25 


5,728 


60.72 


42.5 


33.75 


18.0 


94.25 


4,660 


49.40 


42.5 


33.75 


18.0 


94.25 


4,051 


42.94 


42.5 


33.75 


18.0 


94.25 


2,408 


25.52 


42.5 


33.75 


18.0 


94.25 


4,660 


49.40 


34.0 


19.0 


13.0 


69.0 


2,772 


40.17 


42.5 


33.75 


18.0 


94.25 


4,375 


46.38 


42.5 


33.75 


18.0 


94.25 


2,841 


30.11 


42.5 


33.75 


18.0 


94.25 


4,184 


44.35 


42.5 


33.75 


18.0 


94.25 


3,302 


35.00 


42.5 


33.75 


18.0 


94.25 


3,600 


38.16 


42.5 


33.75 


18.0 


94.25 


2,660 


28.20 


42.5 


33.75 


18.0 


94.25 


2,853 


30.24 


42.5 


33.75 


18.0 


94.25 


(3,599) 


38.15 
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First 


Library 


Open 


Hours 


Cat. Users 


Users 


Week 


Day 


Day 


Eve 


Wkend 


Total 


(by Extrap.) 


Per Hour 


41 


0902 


34. 


19. 


13.0 


66.0 


(2,998) 


42.39 


42 


0909 


42.5 


33.75 


18.0 


94.25 


(3,599) 


38.15 


43 


0916 


42.5 


27.75 


18.0 


88.25 


(8,496) 


95.14 


44 


0923 


42.5 


33.75 


18.0 


94.25 


(9,074) 


96.18 


45 


0930 


42.5 


33.75 


18.0 


94.25 


9,260 


98.16 


46 


1007 


42.5 


33.75 


18.0 


94.25 


9,948 


105.49 


47 


1014 


42.5 


33.75 


18.0 


94.25 


8,336 


88.36 


48 


1021 


42.5 


33.75 


18.0 


94.25 


9,772 


103.58 


49 


1028 


42.5 


33.75 


18.0 


94.25 


9,678 


102.59 


50 


1104 


42.5 


33.75 


18.0 


94.25 


9,961 


105.59 


51 


1111 


42.5 


33.75 


18.0 


94.25 


10,179 


107.90 


52 


1118 


42.5 


33.75 


18.0 


94.25 


9,586 


101.61 


53 


1125 


33.5 


13.5 


9.75 


56.75 


5,916 


104.25 


54 


1202 


42.5 


33.75 


18.0 


94.25 


12,627 


133.85 


55 


1209 


42.5 


33.75 


18.0 


94.25 


11,012 


116.63 


56 


1216 


42.5 


33.75 


3.25 


79.5 


9,091 


114.34 


57 


1223 


33.0 






33.0 


2,772 


132.00 


58 


1230 


33.0 






33.0 


3,218 


97.52 


59 


690106 


42.5 


33.75 


18.0 


94.25 


12,215 


129.38 


60 


0113 


42.5 


33.75 


18.0 


94.25 


9,167 


97.17 


61 


0120 


42.5 


33.75 


18.0 


94.25 


8,450 


89.57 


62 


0127 


42.5 


33.75 


18.0 


94.25 


8,219 


87.12 


Totals 


2561.5 


1814.0 


945.5 


5321.0 


444,035 




1-53 


total 


2198.0 


1577.75 


834.25 


4610.0 


367,264 


79.67 




average 








86.98 


6,930 




10-62 


total 


2198.5 


1591.25 


834.25 


4624.0 


378,166 


81.78 




average 








87.25 


7,135 




1-9 


total 


363.0 


222.75 


111.25 


677.0 


65,869 


97.30 




average 








75.22 


7,319 




54-62 


total 


363.5 


236.25 


111.25 


711.0 


76,771 


107.89 




average 








79.0 


8,530 





ERIC 



be matched by the computer. An informal presentation on methods and 
results of this study was scheduled to be presented at the Gordon 
Research Conference on Problems in Scientific and Technical Information, 
Colby Junior College, New London, New Hampshire, July 12-17, 1970. 

Traffic 

Seasonal variation of traffic in the catalog was studied by 
determining weekly figures for traffic into the catalog area. This 
was done by taking the average number of users (catalog entrants) per 
observation period actually counted during that week and then multiplying 
by the number of equivalent periods during which the library was open 
during that week. In making this calculation, observations for weekday 
business hours (to 5 P.M.) were considered separately from observations 
for weekday evenings (to 11:45 P.M.) and observations for weekends 
(Saturdays and Sundays) . This was necessary because these three time 
periods were sampled in different proportions. The indicated total 
traffic into the catalog area in a week was the total of separate 
calculations for the three time periods. 

Weekly library hours and calculated traffic are shown in Table 1. 
Calculated weekly traffic is plotted in Figure 1. Also plotted in 
Figure 1 is the weekly average of catalog traffic per hour that the 
library was actually open during that week. Weeks during which hours 
were restricted or during which the library was closed for a holiday 
are indicated. Since no traffic counts were made or extrapolated for 
the very-low-traffic periods before 9 A.M. and after 10 P.M. the 
figures plotted in Figure 1 are systematically distorted to slightly 
less than their real values. The five-week late-summer gap during which 
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Week of Traffic Measurement 



Figure 2. Weekly Borrowing Traffic 
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Week of Traffic Measurement 



no traffic measurements were made has been filled in by extrapolating 
the curves from either side of the gap; the curve shape thus supplied 
to this gap is completely consistent with casual observations made by 
the research team during this period and with the past experience of 
Yale librarians who have witnessed the rapid back-to-school build-up 
of library use at the start of a new school year. 

Figure 1 indicates that there are only two significant seasons 
for catalog use: the regular academic year, and the summer vacation 

period. During the summer period* activity is reduced to half the 
level for the regular academic year, (At Yale, various short summer 
courses are offered, but primarily for persons other than full-time 
Yale students. Virtually all undergraduates and many graduate students 
and faculty members are absent during the summer.) The activity 
pattern of the regular academic year is punctuated by irregular declines 
that are associated with holidays and recesses (as indicated by reduced 
library hours) , This shows up clearly in the curve for total users per 
week. Because of shortened library hours, the curve for users per hour 
does not necessarily drop when the total user curve drops (see, e.g., 
the Easter recess period — weeks 17 and 18) . The academic year pattern 
takes a full month to drop off into the summer pattern and somewhat less 
than a month to be resumed. 

It is interesting to compare catalog traffic throughout the year 
with statistics for the borrowing of books at Sterling Memorial Library 
during the same time span. Circulation statistics supplied by the 
Circulation Department are plotted in Figure 2. These show all recorded 
book loans (whether for outside or in-building use; and whether borrowed 
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by students, faculty, library staff, or others). It can be seen that 
this curve follows the curve for catalog users in Figure 1 in most details 
In fact, it appears that one can predict circulation rather accurately 
from a knowledge of catalog traffic and vice versa. This could be of some 
value to reference librarians and circulation librarians in scheduling 
their staffs. A discrepancy occurs at New Year, when book borrowing is 
somewhat heavier than catalog traffic would suggest; a similar discrepancy 
occurred in week 20 (Martin Luther King\s funeral) when borrowing remained 
unchanged although catalog traffic declined sharply. At the beginning of 
the Fall semester, week 46, borrowing rises more sharply than the number 
of catalog users. There are only minor disparities during the summer 
period . 

In general, the match of these curves is quite remarkable. However, 
there is no intention to imply here that book borrowing results solely 
and immediately from catalog use. Browsing is known to occur, and can 
result in formal borrowing. Catalog users can identify books and not 
borrow them until days or weeks later. Catalog users can use the catalog 
for purposes other than obtaining books. Yet, the interesting point is 
that all of these phenomena, if important, tend to even themselves out, 
leaving book borrowing and catalog traffic directly proportional to each 
other throughout virtually the entire year. 

Both Figures 1 and 2 include annual overlap periods of about 10 weeks 
In both cases, there is very great similarity in the pattern from one 
year to the next during this overlap period. Some variation in pattern 
might be attributable to t : ie holidays of Christmas Day and New Years Day, 
which fell on Monday in 1967/8 and on Wednesday in 1968/9, possibly 
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causing different amounts of travel by potential library users in these 
respective years. Comparison of users in this overlap period suggests 
an annual growth rate of 10 percent in library use, but this particular 
time of year seems unreliable for predictions because of the differences 
in the holidays. Comparison of book borrowing figures in the overlap 
period also suggests an annual growth rate, but a smaller one of about 
5 percent. It is interesting, however, that circulation figures for the 
successive July-to-June fiscal years 1967-1968 and 1968-1969 indicate 
no change in annual circulation; the total was 370,000 volumes borrowed 
in each year . 

The total number of catalog users during a year, as indicated by 
traffic measurements (Table 1) appears to be 370,000 or 380,000, depending 
upon which overlap period one chooses in defining a year. The fact that 
this number is the same as the total of books borrowed is probably a 
coincidence; there is no reason to infer that each person entering the 
catalog seeks and borrows a single volume. It is well known that that 
is not the case. It should be noted that the measurements consistently 
ignored traffic during periods just after library opening in the morning 
and before closing late at night when traffic is very light. If these 
periods were included, it is estimated that the total number of persons 
entering the catalog area in a year would be about 400,000 or slightly 
less. Furthermore, it should be remembered that a substantial number of 
people who enter the catalog area merely walk through it as a short-cut 
between the front entrance and the main reading room. Observations 
indicated this to be about 80,000 "walkthroughs" in a year. Thus, the 
number of persons actually consulting the catalog in a year was of the 
order of 320,000. 
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The geometry of the catalog area was discussed above, and it was 
mentioned that there are five entryways into the catalog area. On the 
basis of gross traffic measurements, it was ascertained that the two 
entryways farthest apart (the one nearest the front entrance and the one 
nearest the main reading room) receive the most entrants. Traffic iuto 
these entryways is about equal. The three intermediate entryways have 
gross entry traffic ranging from one-half to two-thirds this level, 
liov/ever, the pattern changes if one ignores the people who walk through 
the catalog area without using the catalog. The "walkthroughs" enter 
mainly through the two portals nearest the main reading room. The rate 
of entry of actual catalog users is clearly highest through the portal 
nearest the front door. Entry of actual users through all other portals 
is roughly the same, and is about half the rate for the portal nearest 
the front door. This finding suggests that there is a strong tendency 
for catalog use to be undertaken immediately upon entering the library 
by those who use the catalog at all. 

A clear pattern of catalog traffic variation with day of the week 
was observed, and is shown in Table 2. Figures are derived from measure- 
ments made between the hours of 9 A.M. and 10 P.M. It can be seen that 
the catalog use rate is heaviest during the early part of the week, 
especially on Tuesdays, and that it is lowest on Saturdays and Sundays, 
as one would expect. The grand mean use rate for hours in this range 
that the library was open is 95.9 entrants per hour. The span of 
deviations from this mean from the busiest day (Tuesday) to the slowest 
day (Saturday) is 26 percent of the mean. The percentages shown in the 
table refer to entrants per hour and not to total users in a day; the 
library was open only half as many hours on the average Saturday or 
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Table 2 

Variation in Catalog Attendance by Day of Week 





Entrtmts per hour 


Percent of Yearly Average 


Monday 


100. A 


104.7 


Tuesday 


106.5 


111.1 


Wednesday 


102.0 


106.4 


Thursday 


95.0 


99.1 


Friday 


92.6 


96.6 


Saturday 


81.7 


85.2 


Sunday 


82.5 


86.0 


Yearly Average 


95.9 
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Sunday as on the average weekday. 



Catalog traffic varies greatly from hour to hour during the day. 

Average hourly traffic for an entire year is plotted in Figure 3. 

Variations show up more sharply if one plots only the average hourly 
traffic for weekdays (Figure 4). It can be seen that the rate of 
influx of catalog users builds to a morning peak at about 11:15, drops 
off through the early lunch period, then builds rapidly to a maximum 
lust before 2 P.M., remains high until 4 P.M., drops rapidly to an extremly 
low level just after 6 P.M., rises again after 6:30, but not very much, 
and finally drops off again from 8 P.M. until closing time. 

Data were collected which permitted articulation of use rate variations 
at 6-minute intervals. Such articulation produces few surprises. It 
shows, during weekdays, a clear build up of entry rate in the 12 minutes 
immediately after the hour in the morning and mid-afternoon (as users 
arrive from classes). From 10 A.M. to 5 P.M. there is a tendency for the 
entry rate to drop off shortly before the hour (except at 1 P.M.). At 
5 P.M. sharp and 6 P.M. sharp, there are rapid declines in entry rate 
T hich are shown clearly by the data. At 7 P.M. sharp and 8 P.M. sharp 
there are rapid rises in entry rate. The range from highest to lowest 
entry rate during the five 6-minute intervals in a half-hour period is 
generally about 25 percent to 35 percent of the average entry rate for 
that period. 

All of the foregoing discussion of catalog traffic referred to the 
rate at which people entered the catalog area. How long people tend to 
stay in the catalog area is also a valid question. Observation of 
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Figure 3. Yearly Total Catalog Traffic by Half Hour of the Day 
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Figure 4. Yearly Weekday Catalog Traffic by Half Hour of the Day 
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Time of Day 



non-users (^walkthroughs”) permitted estimation of the correct number 
of real users of the catalog. Determination of the amount of time which 
real users spend at the catalog was also attempted through a regular 
sampling program in which the first users entering the catalog during 
specified 8-minute intervals were followed in order to note the amount 
of time spent in catalog use. At the end of the 8-minute period, a user 
(who could have entered the catalog at any time during the interval) might 
still be using the catalog, but the timing was broken off with only the 
notation that the catalog use was incomplete. Reconstruction of the profile 
of duration of catalog use from such data is a difficult matter, but 
not hopeless. The profile that emerges shows a peak for the most frequent 
catalog use period at 2 minutes. There are only about half as many 1-minute 
users. The number drops off from the 2-minute peak to about half of the 
peak value at the 4-minute interval; it decays slowly to a negligible 
value for intervals beyond a half-hour or so. Since there were no actual 
measurements of the longer intervals, it should be understood that this 
description is somewhat hypothetical. It is based on the assumption that 
there are two different normal populations of catalog users, with modes 
centering around the 2-minute and the 6- or 7-minute intervals. Such 
an assumption fits well with the data actually observed. The standard 
deviation for the second group is much broader than for the first, although 
the actual populations seem to be of roughly the same size. 

Traffic statistics revealed no flaws in the design of the interview 
sampling method adopted in this study. On the contrary, the complexity 
of the traffic pattern strongly justifies the original decision to conduct 
interviews throughout an entire year, with representative coverage of 
different days of the week, hours of the day, periods within the hour, 
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and portals of entry to the catalog area* The interviewing schedule was 
deliberately non-representative in its coverage of the summer period as 
compared to the academic year, and in its coverage of evening and weekend 
hours as compared to weekday working hours. But simple weighting factors 
could be, and were, applied to the results of interviews that are completely 
representative of catalog use with respect to season, day of week, time of 
day, portion of the hour, and portal of entry to the catalog area. These 
results are discussed in the sections that follow. 

Catalog Users 

During the period March 25, 1968, through April 5, 1969, the interviewing 
schedule specified times and catalog portals for a maximum of 2699 possible 
interviews. The actual number of interviews conducted during this period 
was 2134, consitituting approximately two-thirds of a percent of the estimated 
320,000 persons who actually made use of the catalog during the same period. 

Of the 565 scheduled times in which no interview was conducted, 384 are 
accounted for simply because no catalog users appeared at those times (six- 
minute intervals) . Another 161 instances are due to illness or inadvertent 
absences of the interviewers or to misunderstandings regarding assignments* 
Only 20 resulted from the refusal of catalog users to grant interviews 
(less than 1 percent of the catalog users approached) . 

The extremly high degree of cooperation of catalog users was very 
gratifying; and it lends extra credibility to the findings of the study. 

Most interviewees were very pleased to be asked about their needs, and 
discussed them without much probing by the interviewer . Most of those who 
refused to be interviewed did so apologetically, explaining that they would 
be late for classes or appointments. 
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Table 3, based on adjusted data from interviews, shows the degrees 
of catalog use by the various groups within the academic community. 
Graduate students dominate over other distinguishable groups in terms of 
absolute use. However, in terms of use as related to size of the eligible 
group population, the Yale upperclassmen show somewhat greater use of 
the catalog than do graduate students. Yale faculty rank below under- 
graduates and graduate students in terms of absolute use of the catalog. 

In relative use they are probably comparable to freshmen (if one defines 
faculty to comprise a group of about 2000); but it should be remembered 
tho unlike freshmen, faculty members can make heavy use of departmental 
libreries, can use assistants for library work, and can often go directly 
to desired subject areas in the Sterling Memorial Library stack without 
consulting the catalog. Other Yale employees rank below faculty, and 
wives and family of faculty rank last in use among people connected with 
the university. However, non-Yale students, non-Yale faculty, and other 
’’outside" users account for a total of 19 percent of catalog use — a very 
respectable proportion. 

The cross-section of catalog users varies with season. During the 
summer period, there is, understandably, relatively less use by Yale 
groups as compared to visitors, and less by undergraduates as compared to 
graduate students. Relative use by faculty and staff doubles in the 
summer period over the academic year. But since the rate of use of the 
catalog falls to half of the academic level during the summer, this, 
merely means that total use per week by faculty and staff remains just 
about constant throughout the calendar year. Use by graduate students, 
although constituting a steady relative proportion throughout the year 
actually falls in the summer to half of the academic year level in terms 
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Visitors, retirees 
others 



of instances of use per week. Actual instances of use per week by 
visitors doubles in the summer period. Use per week by non-Yale 
faculty remains about the same; and use per week by non-Yale students 
declines, but not as sharply as use per week by Yale undergraduates. 

Use per week by faculty family increases by half in the summer. 

C atalog Searches 

Four basic types of catalog search objectives were identified in 
the study; they have been designated as document search, subject 
search, author search, and bibliographic search. In a document search , 
(often called a "known item" search) the catalog user is aware of the 
existence of some particular book or publication that he wants to locate. 
In a subject search , the catalog user is interested in both identifying 
and locating one or more documents pertaining to some known topic. In 
an author search, the catalog user is aware of some author, publisher’s 
series, or other source of literature and is interested in identifying 
and possibly selecting specific documents from that source. In a 
bibliographic search, the catalog user is interested in using the catalog 
itself to supply or verify bibliographic information regarding a known 
document; he is not interested in locating and using the document. 

The distribution of searches among these four basic types is given 
in Table 4. The distribution was determined in two different ways, 
yielding two different results. The first column is based on the 
immediate objective of the catalog user at the moment of his approach 
to the catalog. Most of the questions asked during an interview 
pertained to this immediate objective. 
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Table 4 



Distribution of Search Objectives 



Search Type 


Immediate, % 


Underlying, % 


Document 


73 


56 


Subject 


16 


33 


Author 


6 


6 


Bibliographic 


5 


5 




100 


100 
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However, it was hypothesized that some users may try a search 
approach for which a library catalog is particularly suitable as an 
indirect means of performing a different type of search for which 
the catalog is not as well adapted. Specifically, it was hypothesized 
that some of the people performing document searches were really inter- 
ested in subject information and were first seeking such information, 
for convenience, in known documents that they considered likely to 
contain the desired information but that they did not regard as the 
exclusive objectives of their searches. This hypothesis was tested 
by means of a simple question asked at the very end of an interview 
with a catalog user whose immediate objective was a document search. 

The user was asked what he would do if his intended document search 
should be unsuccessful — whether his search would end there, or 
whether he believed he might find what he wanted in some other publi- 
cation. (For obvious reasons it was not necessary to ask this question 
of document searchers who were looking for works of fiction or for items 
on lists of assigned reading.) 

The responses to this question revealed a rather dramatic difference 
between immediate objectives and underlying objectives in catalog 
searches. The distribution of underlying objectives is given in the 
second column of Table 4. It indicates that about a third of the 
catalog users are basically interested in subject or topical information, 
but that half of these users attempt to use a document search to make 
do for a subject search. In terms of underlying interest, document searches 
account for only 56 percent (not 73 percent) of catalog use. 

It is interesting to note that no significant variations in the 
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distribution of search objectives with respect to season of the year, 
academic status of user, departmental affiliation, or newness to the 
Yale library were detected in this study* 

In searches where the immediate intent is to locate a particular 
document, 80 percent of these desired documents are monographs, and 20 
percent are articles in periodicals. 

On the average, about 26 percent of desired documents are already 
known, through previous contacts in the same library or elsewhere, to 
the users who seek them (27 percent of the monographs, 22 percent of the 
periodical articles) . Another 22 percent of the periodical seekers have 
had some contact with the periodical desired but not the specific article 
desired. Likelihood of previous contact with the desired document increases 
with years of library use, from 22 percent (25 percent of monographs, 10 
percent of periodical articles) for users in their first year of experience 
with the library to 52 percent (58 percent of monographs, 26 percent of 
periodical articles) for users with more than twenty years of experience. 
There is a curious interruption in this general trend that occurs 
among users with seven to nine years of experience; these individuals 
were found to seek fewer familiar documents than any other group: 18.5 

percent (16 percent of monographs, but 30 percent of periodical articles). 

In all bibliographic searches and in 98 percent of the document 
searches, the user felt able to state whether his search was successful 
or unsuccessful as soon as he was finished with the catalog. However, 
for 40 percent of the subject searches and 30 percent of the author searches, 
the user stated that he would have to defer judgement on success or failure 
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until he had looked at specific books or browsed specific stack sections 
identified through the catalog. In all, about 91 percent of the catalog 
searches studied were evaluated by the users as successes or failures 
right at the catalog, and 9 percent were evaluated only after further 
effort elsewhere in the library. 

Results of document searches showed that almost 84 percent turned 
up th desired item (with one or more additional useful items identified 
in 9 percent of these searches) . Another 2 percent turned up useful 
supplemental documents, but not the specific document originally desired. 
Only 14 percent of the searches turned up nothing at all. 

Although data for subject and author searches are less complete than 
for document searches (because it was not possible to obtain full reports 
on all searches continued away from the catalog) , it appears that there 
is no great difference in succ€‘ss rate. Of course, successful author 
and subject searches tend to turn up larger numbers of documents than 
successful document searches, since that is almost always the intent of 
the catalog user. 

Users who had come to the catalog to carry out a document search 
were asked to state their intended approach to the catalog. Results are 
shown in Table 5. The author approach dominates. The title approach 
is next. Subject and editor approaches are rare compared to author and 
title approaches 

The decision of some catalog users to conduct a document search by 
looking up a subject terra may seem strange (4 percent in Table 5) , but 
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Table 5 



Intended Approach to Document Searches 



Percent 



Author name (personal or corporate) 

Title of monograph or periodical 
Subject term 
Editor name 

Author or title of portion of work (analytic) 

Series title 

Other (publisher, translator, geographic location, etc.) 



62. 



28.5 



A. 5 



A. 



.5 

.2 

.3 



100.0 



it can be thoroughly rational. If the author search would involve looking 
through a vast number of cards (e.g., when it is a "U.S." or "Great 
Britain" main entry, or when the author has a common last name and the 
user does not know his given name), it may take less time to find the 
document under an obvious subject entry, providing there are not too many 
cards under that particular subject. 

Look-up by author or title of an analytic is, in general, a 
fruitless approach to the catalog. This approach is found exclusively 
among users who have had less than two years of experience with the Yale 
libraries. Conversely, look-up by series title (which can be productive, 
but not necessarily) is observed much more frequently among experienced 
users than among those with less than 2 years of experience. 

Users in their first two years of experience account for 55 
percent of catalog use. The first year pattern is close to the over-all 
average with respect to author approaches, title approaches, and subject 
approaches, but lower with respect to editor approaches. The second 
year pattern is higher than the over-all average with respect to title 
approaches, but lower on author approaches and subject approaches; these 
disparities disappear in the third year. 

Statistical analysis did not suggest any particular difference 
in average search success as a function of experience. If there is any 
trial-and-error learning phenomenon to be found, it cannot be very 
prominent. Nor was there any statistical suggestion that newcomers tend 
to become frustrated and tend to avoid the library toward the latter part 
of their first year. Newcomers who use the catalog seem to know pretty 
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much what to expect from the beginning. Although they may make occasional 
false starts, they still tend to find what they seek about as often as the 
more experienced users. This indication is not very surprising when one 
considers only those newcomers who are graduate students or faculty members — 
all of these are familiar with similar libraries at other colleges and 
universities. However, the apparent absence of a conspicuous learning 
phase for freshmen is more puzzling. A possible explanation is that there 
really jLs such a learning phase for freshmen but that it is hidden behind 
an abnormally high ji priori probability of success in the types of searches 
undertaken by freshmen. More than one third of the catalog searches 
attempted by freshmen are for documents listed on printed course assignment 
lists. Starting with such accurate, veil-formatted, locally tested reference 
lists, the probability of success is very high. For upperclassmen, the 
proportion of searches based on such lists is only one sixth; for graduate 
students as a group it is one ninth. 

Search Success and Potential for Improvement 

It was reported in the previous section that 16 percent of 
document searches were unsuccessful. A special follow ur ( ?.tudy of 256 
unsuccessful searches was made in order to learn more about the reasons 
for failure: At the conclusion of the interviewing period, the research 

staff attempted to re-run the unsuccessful searches, using only the clues 
provided by the catalog users during the original interviews. In these 
follow-up searches, considerable us'j was made of such reference works as 
Books in Print and Union List of Serials to save time in fully identifying 
some of the desired documents before looking for them in the catalog. 

Some 31 percent of the follow-up searches turned up the desired 
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document in the catalog under entries known to the original catalog user 
at the time of the interview. These searches had apparently failed 
because of faulty search technique or because of failure to persevere. 
Almost 60 percent of the follow-up searches had potentially adequate 
search clues to documents that did not happen to be in the library 
collection at the time of the search; one fifth of these had been added 
to the catalog between the dates of interview and follow-up. Less than 
10 percent of the follow-up searches (about 1 percent of all attempted 
document searches) could not be evaluated because of inadequate or 
clearly inaccurate clues; it is possible that the desired documents for 
this group were all in the collection. Table 6 summarizes the over-all 
results of document searches in terms of complete success or the three 
types of failure described above. 

An apparent conclusion from this study of failed document searches 
is that there is more room for improvement in catalog service through 
instruction of users in the proper use of the catalog (5 percent potential 
improvement) than there is through expansion of approaches to the catalog 
(1 percent potential improvement) at least for document searches. An 
even more obvious conclusion is that more and faster acquisition and 
cataloging of new books are desirable (10 percent potential improvement) 
as a means of improving service; but this is a truism in libraries. 

The modification or expansion of catalog entries in the existing 
catalog apparently has the least potential of all three possible 
approaches to improvement of catalog service (i.e., coverage, user 
education, and modification). Of course, this conclusion considers only 
the "absolute" retrlevability of the types of documents which catalog 



Table 6 



Success and Failure of Attempted Document Searches 





Percent 


Document in catalog, located by user 


84 


Document in catalog, not located by user but found 
by research staff through user’s starting clues 


5 


Document definitely or probably exists but was not 
in catalog at time of user’s search 


10* 


Document possibly in catalog, user’s clues inadequate or 
grossly inaccurate 


1 




100.0 



* One fifth of this group of documents were added to the catalog 
from 1 to 12 months after the user’s unsuccessful search* 



users seek currently. It does not consider the convenience factor in 
catalog searching. One could conjecture that some catalog users fail 
to succeed in their searches not out of ignorance* but simply because 
they are confronted with too many cards to look through under a valid 
entry. If so, catalog modification may be nore desirable than it appears. 
This would seem especially plausible for the subject and author searches 
that are not represented in Table 6. One could further conjecture that 
making the catalog more convenient to search would tend to bring more 
users (back?) to the library to make searches based on clues that they 
know would be inconvenient or unproductive with the catalog as currently 
constituted. 

Traffic measurements bearing on the question of catalog convenience 
are inconclusive. As was stated earlier, no clear evidence could be found 
of a frustration factor (diminishing catalog use) among freshmen and other 
newcomers in their first year of experience with the library. In fact, 
freshmen tend to increase their frequency of catalog use as they become 
upperclassmen (Table 3); but this could be explained as merely an 
involuntary requirement of their academic programs or else as a heightened 
awareness of the positive aspects of the existing catalog. 

Some fragmentary evidence on the value of different approaches to 
catalog improvement can be derived from comments which were offered 
gratuitously to interviewers by catalog users during the data gathering 
phase of this study. Some 75 individuals stated complaints or suggestions 
about the library. These were noted and preserved. About half of the 
comments reflected on catalog coverage, catalog user education, or catalog 
design. Of this group, 8 comments indicated a desire or need for more 



information regarding the use of the catalog (general orientation, 
interpretation of abbreviations in catalog, determination of language 
of cataloged work, transliteration and filing rules for non-Roman 
alphabets, interpretation of subject class numbers). Eight comments had 
to do with improving the physical convenience of catalog use. Nine 
comments requested more cross references in the catalog, of several types. 
Three requested more convenient access to periodicals by title. Two 
complained about the generality of subject headings; one complained about 
the inconsistent treatment of a particular topic that appears in various 
subject subsections of the catalog. Seven wanted the catalog cards to 
provide more collation and notes information about the cataloged works. 

Four wanted the catalog to provide access or better access to certain 
types of literature (journal articles, dissertations). These comments 
do not suggest any unanimity among catalog users as to a single best 
approach to improvement of catalog usefulness. They suggest only that 
there is interest in improvement along all lines of approach. 

The matter of catalog coverage (size of collection, rate of 
acquisition, promptness of cataloging) is beyond the scope of this 
study and will not be discussed further. The general question of user 
education is also beyond our scope. The remainder of this chapter on 
project results will deal primarily with findings on various aspects of 
catalog responsiveness, both current and potential. Some of these findings 
suggest specific needs and opportunities for better orientation of users. 
But the main objective will be to try to clarify how the catalog does 
respond and might respond to the demands which users actually make of 
it currently. 
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Search Clues, Catalog Access, Catalog Data 



As Vickery has pointed out (8) , there are four functions that can 
be served by the data elements, or items of information, on a bibliographic 
record such as a catalog card: As a group, the elements serve to identify 
(by description) some specific publication. Each individual element can, 
at least in principle, serve as an entry for retrieving the collective 
bibliographic record from a file such as a catalog. One or more elements 
can indicate the physical location of the publication described by the record. 
Finally, the symbolic form used for recording an element (usually letters and 
numerals) can facilitate the sequencing of groups of records for convenient 
access (usually in alphanumeric order). Curran and Avram (9) have identified 
and listed hundreds of bibliographic data elements, of recognized or potential 
value, which might be included and distinguished in the bibliographic records 
of libraries. 

The use of a bibliographic data element as an entry term for a catalog 
may be justifiable operationally if the users of the catalog tend to possess 
corresponding search clues they can match against entry terms of that type. 

The catalog studied in this project, like catalogs in most large research 
libraries, offers entry or access primarily by author (main entry) terms, 
title terras, and subject terms. Some observations can be made about the 
appropriateness and sufficiency of these approaches to the catalog. 

It was reported above that document (known-item) searches account for 
73 percent of the searches attempted. Virtually 100 percent of the documents 
have titles (which can be periodical titles as well as monograph titles); 

82 percent of the documents have authors (those without authors obviously 
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Author and Title L'ata in Document Searches 
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being periodicals primarily). Of the documents with authors, almost 92 percent 
have personal authors as opposed to corporate authors. There is no reason to 
believe that the pattern for documents ii unsuccessful document searches is 
substantially different from the pattern for documents in successful searches. 

By far the most consistently available and usable search clues brought 
to the catalog by users interested in document searches are clues relating 
to author and/or title. Details of availability of author data and title data 
are compared in Table 7. It is clear from the first two columns that, on the 
basis of absolute count, title data in general, and accurate title data in 
particular, are available much more frequently than corresponding author data. 
This finding is in qualitative agreement with results of other researchers 
(2_, 10) , some of whom have interpreted them as proof that library cataloging 

should use titles (rather than authors) for main entries and that users should 
be encouraged to access catalogs by title rather than author if both clues 
are available. 

The present study does not clearly support such a sweeping interpretation. 
It should be remembered that the author approach is not even a possibility 
for some 18 percent of the documents desired by users; for these, the main 
entry is by title, and that is obviously the approach which users take in 
searching for these documents. The comparison of author and title approaches 
should properly be related to the 82 percent of documents for which both 
approaches are possible. The third column in Table 7 gives an approximate 
basis for such a comparison by multiplying the absolute title values by 82 
percent. (It is fully accurate if there are no differences in title data for 
authored documents as compared to anonymous documents — which has not been 
determined.) If one now compares the first and third columns, it can be seen 
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that the differences between author and title clues* in terms of availability 
and accuracy, are very much reduced. Titles come out best by a few percent 
in availability as clues (79.5 percent vs. 77 percent). They are considerably 
better than author clues where complete accuracy is the criterion (50.4 percent 
vs. 42.3 percent); but they are actually a bit worse than author clues where 
the criterion is the sum of complete and partially complete clues (62.7 
percent vs. 65.5 percent). 

The proper interpretation of these results would seem to be that — 
at least for the large academic library — neither author nor title has an 
overwhelming advantage as candidate for main catalog entry and for preferred 
search approach. Perhaps that is why the controversy on this question has 
persisted so long in library circles without any concensus. There seems 
to be an advantage to the title approach, but only a slight one. 

One can question why there is such a strong tendency for library users 
to approach the catalog by author (Table 5) if the title approach is just 
about as good if not better. Part of the answer must be connected with the 
nature of the Sterling Memorial Library public catalog. Only three fifths 
of the entire Yale University collection is represented by full sets of 
entries in that catalog; the remaining two-fifths is represented only by 
main entry cards which are usually author entries. Thus, the user of the 
catalog casts a wider net if he searches by author. However, previous 
training in library use must also be a factor in preference for the author 
approach; there is a noticeable tendency for the author approach to be 
used more heavily during the early hours of the day (when experienced 
professionals make their heaviest use of the library) than in the evening 
(when use by students predominates) . 
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Furthermore, uncertainty regarding the completeness or accuracy of 
search clues can often favor the author approach. It was found in this 
study that author clues and title clues tend to co-vary with respect to 
completeness and accuracy* When one is entirely accurate, the other tends 
to be the same. However, users generally seem to know at least the Iasi 
name of an author, or a reasonable approach to it. If the name is not 
too common, the user stands a good chance of finding what he seeks in a 
brute force author search of limited section of the file. If he knows only 
the frequently used first word of a title and is not sure of the rest 
(e.g., "History of...”), he would have difficulty with a title approach. 

Of course, the title approach would be preferable if he has great confidence 
in his knowledge of the title and if he has only imcomplete knowledge of 
an author with a common last name. Statistics on these alternatives were 
not studied in this project, but the data gathered would permit such a study. 

An indication of the potential for improving the success rate of author 
and title approaches to the catalog is found in lines 6 and 7 of Table 7. 

For authors, between 0.7 and 1.0 percent improvement could be gained by 
providing catalog access to names that are prominent in a document T s 
description but that are not, strictly speaking, authors by current 
definitions. These would tend to be the names of editors, compilers, 
translators, and study group chairmen. For titles, there is much greater 
potential for improvement through filing under additional terms — from 1.3 
percent to 4.1 percent. The types of title-like terms involved here include 
subtitles, short titles, series titles, major analytic titles, and popular 
designations. 

It is noteworthy that all of the document searches represented in 
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Table 7 were successful despite the indicated major deficiencies in users 1 
starting clues. The adaptability of the human being in his interaction 
with the conventional card catalog must noc be overlooked or underestimated 
when considering the possibilities of the computerized catalog as an alternative 
to the card catalog. Humans using the card catalog were able to compensate 
for many inadequacies in completeness, accuracy, and appropriateness of 
their starting clues. They used several devices in compensating: brute- 

force searching through fairly large portions of the catalog; sampling of 
possible alternative spellings; or (quite infrequently) shifting to another 
type of search approach (including the subject approach). Success was usually 
determined by degree of agreement with one or more starting clues in addition 
to the clue used for entry to the catalog. 

Achievement of near-human (or, hopefully, better-than-human) facility 
in compensating for inadequacies in search clues would be essential if 
computerized catalogs were to replace card catalogs in large research libraries. 
If there is any mismatch at all, no matter how minor, between search clues 
and file data in a computer search, the computer will fail to retrieve unless 
it is given some definite program that will cause it to ignore particular 
kinds of mismatches. Several methods have already been described in the 
literature by which computers have been programmed to retrieve from biblio- 
graphic files despite errors in search clues. The performance of two of 
these methods (11, 12) was tested, by mam al simulation, on actual search 
clues and actual corresponding catalog data gathered in this study. Results 
have already been published (6, 7) and will be summarized very briefly here. 

Both of the computer retrieval methods studied made use of both 
author and title data. One method ( 11 , 6) truncates these data (in both 



the search clue and the catalog file), taking a prescribed number of letters 
from the author's last name and the first and second words in the title. 
These truncations are matched to achieve retrieval. This method overcomes 
ignorance of an author's first name as well as errors in endings of words 
and names. The second method (12, 7) compresses author name and title words 
according to specific rules for casting out letters and syllables before 
clue and file are matched. This method overcomes certain types of common 
misspellings. (Both methods* of course* can cause retrieval of incorrect 
matches (false drops) as well as correct matches; but this aspect was not 
studied here.) Data gathered from a sample of 126 successful document 
searches were used to test the retrieval capabilities of both methods. In 
77 searches (61 percent), both the author and title were known perfectly 
by the catalog user; these documents would therefore be retrieved by any 
method at ell. In 49 searches (39 percent) there were inadequacies in 
clues for author or title or both. Ambiguities in the methods tested 
were always resolved in favor of the methods. Both produced the same result 
in over-all retrieval: 70 percent; in other words, each method was capable 

of producing about 9 percent more retrieval than simple character-by- 
charscter computer matching. Yet human searching had produced 700 percent 
retrieval with this same set of searches. 

Although minor improvements might be made in each of the computer 
retrieval methods tested, it seems highly unlikely that either method — 
or any other single method — can be developed to the point where it seriously 
rivals human retrieval capability. On the other hand, there is reason to 
believe that combinations of methods can be developed that will approach 
or surpass human performance. If one were to accept retrieval from either 
of the two methods tested, the combined performance would go up to 80 
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percent. New computer methods could be devised to consider the data from 
many viewpoints at once, just as human catalogers do. However, the use 
of combinations of retrieval methods will probably cause unmanageable 
quantities of false drops also, unless devices are incorporated for suppressing 
retrieval as well as for promoting retrieval. Human catalog users tend to 
suppress retrieval on the basis of clues other than the ones they are using 
for file entry (e.g., subject entries, date of publication, place of publication, 
contents notes, author’s birth and death dates). Obviously, machines could 
do likewise only if data elements other than author and title were available 
in their memories and only if such additional data elements could be accepted 
from the user along with his primary search clues. 

This point must not be minimized. There is a strong temptation among 
would-be catalog computerizers to deal with the high cost of computer memory 
units and processing units by cutting to an absolute minimum the types of 
information to be included in a catalog record. For example, if a computer-* 
ized catalog were to be designed for servicing document searches only, the 
temptation might be to store only author, title, and call number. It should 
be understood, however, that this would tend to make the computerized catalog 
inherently incapable of achieving the same retrieval efficiency (selection 
and rejection capability) as the conventional catalog. The inclusion of 
other data elements would enhance the computer catalog’s retrieval potential, 
but at a price in storage and processing costs. Further intensive work is 
required on the relative trade-offs in usefulness and cost of including 
additional data elements in a computerized record. 

Date of publication is probably the most obvious candidate for inclusion 
in any file intended for use in document searches. After title-like and 
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author-like search clues, date clues were the next most common clues among 
catalog users (Table 8) * 

Since only 59 percent of the catalog users have any confidence in 
their knowledge of publication date, and since their information when they 
have it is frequently off by several years, it is clear that date is not 
a very useful clue for primary access to the catalog. Nevertheless, if 
primary acceiss by some ocher clue should not be sufficient to discriminate 
among many possible documents (as when author or title clues are imperfect) , 
even a poorly known publishing date can be very useful in narrowing the 
field, A user who searches by author but does not know the author r s first 
name, yet who knows merely that the book he seeks was written after World 
Was II, can search a large file section rapidly, rejecting at once any item 
published before 1945. Even more intelligently, he can quickly recognize 
and reject entire file sections devoted to individual authors with the 
same last name who are shown by the catalog to have died before 1945. With 
clues other than date, similar rejection processes are possible, but they 
can be much more subtle and require study. 

So far the discussion in this section has been directed at document 
searches. It is equally applicable to other types of searches as well, 
bibliographic searches are identical to document searches in approach. They 
differ only in the final object of the search — full bibliographic descriptions, 
rather than book locations. Author searches are also similar to document 
searches in that they start with author-like, or title-like entry clues 
(author, editor, series name, publishing institution). They differ in that 
their object is to identify a list of possibly pertinent references for the 
user’s further consideration either at the catalog or away from the catalog. 
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Table 8 



Availability and Accuracy of Publication Date 
Information in Document Searches 



"No information" on date 41% 
Information more than 5 years wrong 12% 
Information 2-5 years wrong 13% 
Information 1 year wrong 10% 
Correct year 19% 
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The (overt) subject search differs in approach, since entry must be by a 
subject heading rather than by an author or title. The process of 
incremental discrimination is, if anything* much more pronounced in 
subject searches than in author or title searches. 

Users engaged in subject searches frequently complain that subject 
sections in the catalog are much too large and general, rarely narrowed 
sufficiently to cover only the particular subject aspect of interest 
to the user. Consequently, the user is lorced to deal with a file section 
containing large amounts of unwanted material. He copes by going through 
this section rapidly, scanning the cards for clues by which he can select 
or reject them. Most subject searchers are interested in retrieving only 
the few most pertinent items; they are usually not interested in building 
comprehensive bibliographies of conceivably pertinent items. Just about 
any category of information that appears on a catalog card can be helpful 
to a subject searcher at one time or another. A scope note or contents 
annotation can often clinch the pertinence of a document. An informative 
title or subtitle, or an additional subject heading, can sometimes do the 
same thing. The author, if his name is familiar to the searcher, can 
serve as a basis for selecting or rejecting an item. In many subject 
searches, material can be accepted or rejected on the basis of age, as 
indicated by publication date for example. Material is often rejected on 
the basis of language. The class number is often valuable as a guide to 
later browsing activity. When a subject search turns up numerous items 
of equally uncertain pertinence, users frequently make use of collation 
information (especially the number of pages) as a means of identifying 
the items that can be scanned and handled with the least effort. The 
extent to which a searcher employs these devices appears to be related 
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to both the original objective of his search and to the quantity of possibly 
useful references turned up by his initial search clue. Data collected in 
this study have not yet been studied in a quantitative way with respect to 
this phenomenon; however, it seems fairly clear that there would be general 
benefit to subject searchers if catalog cards were filed by publication 
date within a given subject heading rather than by main entry. 

One of the questions to which this study was originally addressed is 
whether it would be possible to derive useful catalog records directly 
from newly acquired documents (e.g., as and when adequate print reading 
devices are developed) with little or no creative input from a professional 
cataloger. To facilitate study on this question, the "front matter 11 of 
retrieved documents (title pages, contents, preface, index, etc.) was 
scrutinized^ and photocopied when justified for comparison with catalog cards 
and with user clues for the same documents. It is quite clear from only 
qualitative perusal of this material that there is at present no possibility 
of deriving an efficient all-purpose catalog in this manner without extensive 
huma n intervention. 

There does appear to be some promise to the idea of providing for only 
document searches and bibliographic searches by this approach, but there 
are distinct disadvantages along with the advantages. From the user’s 
viewpoint, it would be an advantage to have a machine provide for access 
by every title and subtitle and series title appearing in the front part 
of a book, rather than by only a human-designated "correct” title. On 
the other hand, a machine might not be as helpful as a human in identifying 
the first significant word in a title for filing purposes. At the very 
least, a human would have to tell the machine what language it was dealing 
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with. In the case of authors, it is clear that there can be definite 
disadvantages to simplistic cataloging. The title page of a book will 
sometimes contain only the initials of an author who is generally known 
by his full name; retrieval would still be possible, but more difficult, 
especially if the author* s last name and initials are shared with many 
other authors. Furthermore, the title page of a printed document dees 
not necessarily tell the truth or the whole truth ( 13 ), as when there 
are falsified publication dates, omitted edition data, pseudonymous 
authors, etc, Scholarly assistance with these matters by human catalogers 
may not be indispensible but it is certainly of tangible value. Further 
study of the economic factors (cost versus benefit) is needed. 

Possibilities for achieving adequate subject cataloging through purely 
automatic processing of front matter appear to be even less promising than 
for descriptive cataloging. It is apparent from scanning the data assembled 
in this study that automatic subject cataloging (based on title page, contents, 
preface, index, etc.) would very rarely provide positive retrieval on as 
little as a two-term coordination of a subject searcher’s starting clues. 
Sometimes the concepts are present, but are stated in teims that are synonyms 
or partial synonyms of the terms known to the searcher. So, at the very 
least, a mechanism would be required for dealiig with synonyms, either at 
the time of cataloging or at the time of searching. Quite often books lack 
informative titles or informative chapter titles; and they very frequently 
lack indexes. When indexes are included, they can be very long and full 
of trivial topics which a computer could not easily distinguish from an 
important topic. Thus, clutter in the computer memory would be a very 
serious problem if automated subject cataloging were to be attempted. 

Despite all of the drawbacks enumerated here, however, the idea of automated 
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subject indexing does have some merit. It appears that such an approach 
would usually provide at least a one-term, match with the clues of the 
subject searcher; if adequate secondary selection-rejection possibilities 
were built into the catalog system, this might conceivably suffice to 
achieve adequate performance. This question requires further quantitative 

study. 
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Discussion 



The results of this study are described in the previous chapter; 
they are listed briefly in the ‘'Summary" chapter at the beginning of this 
report. 

Librarians can undoubtedly draw meaningful conclusions from some of 
these results. But it is important to note that the conclusions to be 
drawn will depend upon the librarian’s general outlook. The librarian 
who believes strongly in expansion of library services will derive quite 
a different message from the librarian who believes very strongly in 
reduction of processing costs. 

For example, consider the findings on immediate search objectives 
(73 percent document searches, 16 percent subject searches) and underlying 
search objectives (56 percent document searches, 33 percent subject searches). 
The expansive librarian might well conclude that he should put more effort 
into improving the subject approach to the catalog so that fewer users 
will be forced to sublimate their subject searches as document searches. 

The retrenching librarian, on the other hand, might conclude that the 
subject approach to the catalog should be eliminated entirely since this 
would do less harm to the utility of the catalog than the elimination of 
the author or title approaches. These two conclusions would be completely 
opposite, yet there is logic in each of them. 

Flexibility, of course, is the rule in successful management. 
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Librarians pick and choose among alternative courses of action in order 
to achieve the best possible results with the limited resources available. 
The interpretation of the results of this study requires knowledge of the 
trade-offs of costs versus benefits — and more study is needed along that 
line. Nevertheless, some general observations may be in order. 

The fact that 5 percent of all document searches fail even though 
the catalog users have adequate starting clues suggests that strong 
consideration should be given to improved user orientation and user 
assistance. User education, or self-education, methods need not he very 
costly; and it would be relatively easy to determine whether or not they 
are effective. 

The fact that 10 percent of all document searches fail because the 
collection lacks the desired document suggests that something might be 
done to acquire more books in anticipation of need and to provide better 
notification of books that are on hand or on order but not yet cataloged. 

(A step in this direction at Yale was the placement of a copy of the "in- 
process" list in the catalog area some months ago.) 

Increasing the complexity and accessibility of the catalog offers 
comparatively little potential for improvement of the success rate of 
searches currently attempted. However, improving the convenience of 
catalog use might attract heavier use of the catalog and the library 
collection. Providing access through a greater variety of title-like 
entries is a promising approach to improvement of document search 
convenience. Filing by date within subject headings is a promising approach 
to improvement of subject search convenience. 
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Data elements other than author, title, and subject are of definite 
value in resolving many searches in which the entry clues are ambiguous 
or inaccurate. Such data elements should probably not be abandoned 
entirely, even in a computerized catalog where data storage is very expensive. 
Further study is warranted on the costs and benefits of acquiring, storing, 
and retrieving such data elements, in order to determine their relative 
values or expendability . 

There is promise in the idea of using automated techniques for 
catalog construction. However, it is unrealistic to expect impressive 
retrieval performance from such catalogs if they contain only information 
copied directly from input documents wj thout some degree of annotation 
and association, whether human-supplied or computer-supplied. 
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Introduction 

A library catalog is intended to make it rel- 
atively easy for a library user to identify and 
locate desired items in a collection. The cata- 
log is a bridge between the information which a 
user brings to the library (in the form of written 
notes, or remembered clues) and the information or 
documents he hopes to carry away from the library. 
In order to assess the adequacy of performance of 
a catalog, one must study how efficiently the cat- 
alog matches the initial clues to the desired 
items. Such a study has been undertaken at the 
Yale University Library under a grant from the 
U. S. Office of Education. 

Two purposes may be served simultaneously by 
a study of catalog performance. In the short run, 
there is the possibility of effecting minor im- 
provements in the traditional library card catalog 
through the identification of desirable modifica- 
tions in cataloging practice. In the long run, 
there is the possibility of effecting major im- 
provements in catalog performance in the future as 
catalogs are converted from card files to computer 
files. At the very least, there is the long-range 
prospect of forestalling costly blunders during 
the transition from cards to computers. Computer- 
ization should be addressed to the needs of users 
and the capabilities of computers; to simply at- 
tempt to mechanize existing card catalog systems 
would be to institutionalize the shortcomings of 
card catalogs rather than eliminate them. But 
shortcomings of card catalogs must be identified 
before they can be eliminated. 

Although a great many prior studies of li- 
brary use have been reported, especially as mas- 
ter's theses in library schools, they are of 
little, if any, use for the purposes stated above. 
In almost all cases, they are based cin question- 
ai res or interviews administered afte r (sometimes 
long after) users had finished whatever they had 
come to the library to do. They elicited consid- 
erable information on what the users actually ac- 
complished in the library, but very little infor- 
mation on what the users originally hoped to ac- 
complish when they came to the library. In most 
cases, no attempt was made to determine the pre- 
cise search clues, recorded or unrecorded, avail- 
able to the users when the search began. Because 
of the frailty of human memory and because of the 
tendency of users to change their search objectr 
ives aftor they run up against the realities of a 
less-th^n-ideal system, it would seem that the 
only wuy to determine initial search clues is 
through interviews conducted at the beginning of a 
library search, not afterward. No reports of such 
studies were found in the literature, except for a 
few in which the samples were so small and poorly 
described as to be useless for interpretation. 
After the present study was begun, it was learned 
that a report is in preparation^ 1 ) on a study of 



this type which involved 100 interviews with 
users of the University of Chicago Library cata- 
log, all conducted during the summer vacation 
period. After the present study got under way 
another apparently similar study was begun at the 
University of Michigan Library under sponsorship 
of the National Science Foundation^) . Every 
effort will be made to compare results of similar 
studies. 

Design 

The present study examines catalog user ob- 
jectives, starting clues, and catalog responsive- 
ness in a large academic and research library. 

The library under study is Sterling Memorial Li- 
brary, the largest library unit at Yale University. 
This library has a book collection of 3 million 
volumes. Its card catalog is of the single-alpha- 
bet type, covering all of the volumes in the 
building with full cataloging and the balance of 
the 5-million volume collection of the Yale 
University Library system with main-entry (author) 
cards. The catalog is used by undergraduates, 
graduate students, faculty, university staff, and 
visitors. Although there are a number of depart- 
mental library collections at the university, the 
Sterling Memorial Library collection is the major 
research collection for many departments, and an 
important back-up collection for all other depart- 
ments. 

Five distinct data-gathering activities are 
included in the study: 

1 . Gross Statistics on Catalog Use 

Parameters of catalog use are determined by 
simple observation, counting, and timing of traf- 
fic in the catalog area. This is necessary in 
order to provide a sound basis for selection of a 
representative sample of catalog users to be inter- 
viewed. Statistics being collected include the 
number of users entering the catalog area versus 
time of day, day of week, and day of year. Al- 
though not essential for sample determination, 
statistics are being collected also on the amount 
of time spent in the catalog by users, the number 
of card drawers consulted per catalog use, the 
frequency of use of individual card drawers, the 
number of cards searched per catalog use, and the 
number of charge-out slips filled in per catalog 
use. Such statistics are of immediate administra- 
tive interest as well as of more general research 
interest. 

2. Interviews Preceding Initiation of Catalog 
Searches 

Using an objective selection technique, 
interviewers approach catalog users at the moment 
of initiation of a search and attempt to determine 
the users' objectives and the precise clues with 
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which they start. This is the most critical 
phase of the entire study. Selection of inter- 
viewees is made objective by tying it only to the 
clock; each interviewer must interview the first 
person entering a given portal to the catalog area 
after a time specified in advance on a prepared 
schedule. The schedule reflects the observed den- 
sity of catalog traffic for each day and time of 
day. 

Interviews are conducted according to a rigid 
plan which was shaped and tested over a period of 
months. A nondirective approach is used at the 
beginning of the interview, to minimize interview- 
er bias and to encourage free expression by the 
interviewee. As the interview progresses, ques- 
tions become more direct in order to fill in im- 
portant details. Responses are recorded by the 
interviewer on a form, then coded and transcribed 
at a later date for computer-aided analysis. To 
save time and preserve accuracy (or inaccuracy) of 
spellings, a photocopier is used to copy notes and 
bibliographies which users bring to the catalog. 

3. Follow-Up Interviews 

Each person interviewed at the start of a 
catalog search is observed unobtrusively until the 
search seems to have been completed or terminated. 
The catalog sections searched by the user are 
noted. He is then approached once more and ques- 
tioned briefly about his success. If he has re- 
trieved call numbers which satisfy his require- 
ments, these numbers are recorded to permit later 
inspection of catalog data and corresponding docu- 
ments. In cases where the user indicates that his 
search will be continued through inspection of 
potentially pertinent documents located through 
the catalog, a further follow-up interview is re- 
quested to determine the results. 

4. Examination of Catalog Cards and Catalog 
Structure 

Using actual clues and actual research re- 
sults obtained from interviews as starting points, 
the existing card catalog can be examined at 
leisure. The cards can be examined to see whether 
they actually contain the types of clues tnat are 
brought to the catalog by the users, or, indeed, 
whether they contain clues that are not wanted. 

The file arrangement and file headings can be com- 
pared with the search approaches taken by catalog 
users. (This is particular^ interesting at the 
start of the school year, before the new students 
and faculty become familiar with the existing cat- 
alog structure.) Hypotheses regarding possible 
changes in cataloging rules or catalog arrange- 
ment can be tested against observed user require- 
ments . 

5. Examination of Retrieved Documents 

Books identified as pertinent as a result of 
catalog searches can be borrowed for examination 
after the catalog user is finished with them. By 
examining such a book, especially its front matter 
(cover, title page, contents, preface, etc.), one 
can judge whether a hypothetical change in cata- 
loging practice could have provided a more conve- 
nient match to the clues brought to the search by 
the interviewee. One can also judge the extent to 



which mechanized techniques, e.g., automatic 
print readers, could have satisfied the search re- 
quirement. 

Work Accomplished 

Data collection began in early October 1967, 
with the initiation of observation of the gross 
characteristics of traffic in the catalog. This 
form of data collection will continue for approxi- 
mately 18 months, to pr vide a full-year profile 
and an indication of y ar-to-year fluctuation and 
trend. 

Interviewing began on a pilot basis in early 
October 1967. A total of 200 pilot interviews 
were conducted by three interviewers before inter- 
view structure and technique were standardized. 
Interviews for actual data analysis were begun in 
March 1968. At this writing (late April) some 200 
"production" interviews have been conducted, and 
virtually all of these had associated follow-up 
interviews. Refusals of interviews have been 
negligible; user cooperation with interviewers has 
been outstanding. Names of interviewees are not 
asked, but academic status and departmental affil- 
iation are recorded. It is anticipated that 2000 
or more production interviews will be completed in 
the 12-month period during which interviews will 
be conducted. 

Data gathering on catalog cards, catalog 
structure, and content of retrieved documents is 
in the earliest stages at this writing. 

Resul ts 

The small amount of information collected so 
far does not warrant any firm or provocative con- 
clusions. It has been observed, however, that the 
initial search objectives of catalog users can be 
divided into four distinct types. These are, in 
order of decreasing frequency: 

1. Speci fi c document search , in which the ob- 
jective is to locate a document already 
known to exist. 

2. Subject search , in which the objective is to 
identify a document or documents correspond- 
ing to a specific subject. 

3. Document group search , in which the objec- 
tive is to identify a document or documents 
corresponding to a specific bibliographic 
description (e.g., any books by a given 
author, any books in a given series). 

4. Bibliographic data search , in which the ob- 
jective is only to retrieve specific infor- 
mation from the catalog card itself and not 
to identify or locate a document (e.g., com- 
pleting a reference for a bibliography). 

Of the production interviews conducted so far, 
80 percent were concerned with the first type of 
search. It is possible that this percentage may 
change as more data are collected; there may be 
large fluctuations in catalog use patterns 
throughout the academic year. Detailed findings 
will be given in future reports. 
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Among people who are concerned with the management of libraries, it is now 
almost universally accepted that the traditional manual card catalog must soon- 
er or later be replaced by an on-line computerized catalog of some surt. This 
is accepted almost as an article of faith; there is almost never any question- 
ing or disputing of its inevitability. I have no intention of questioning or 
disputing its inevitability in this paper. But there a re questions regarding 
the computerizing of library catalogs which ought to, and indeed do, trouble 
conscientious library managers. These are the crucial questions of how to com- 
puterize and when to computerize. The work I will report on was prompted main- 
ly by concern with these questions. 

The notion of computerized library catalogs has been with us for many 
years. Computerized library catalogs were, in fact, set up at libraries here 
and there as far back as a dozen or more years ago — which means during the era 
of the first generation of large computers. They operated in batch mode, of 
course, and on rather restricted document collections; but they operated. And, 
as the years have passed, the catalogs or indexes of more and more document 
collections have been committed to computers. 

The appeal of computers is obvious. There is, first of all, the speed 
and accuracy with which they can perform basic functions, such as filing in of 
new data, compiling statistics, transcribing data for human reading, and trans- 
mitting data for use by other machines. There is the ability of computers to 
perform complex logical searches, at least on pre-designated elements of the 
stored data. And, very important, there is now the ability of computers to 
serve numerous' users simultaneously at diverse locations, by means of time-shared 
terminals, to obviate the need for the users to be in physical attendance at the 
catalog storage location. 

Nevertheless, the use of computerized catalogs today is still highly re- 
stricted. It tends to be confined to applications where the document collection 
is relatively small, where the catalog information is very simple and limited, 
where there is an unusually high value attached, to rapid or remote catalog 
service, where large computing capacity is already available for purposes un- 
related to the library. This is because of the negative aspects of computers: 
the high cost of converting existing catalogs to machine-readable form; the high 
cost of computers; the unavailability of really large-scale rapid-access memory; 
•the limited reasoning capacity of existing computer programs. 

Because the negative aspects of catalog computerization have been particu- 
larly serious for the very large general-purpose library, of which the Yale 
University Library is a prominent example, there has long been a tendency for 
management in these libraries to regard catalog computerisation as probably in- 
evitable but clearly remote. Therefore, it could be dismissed from serious at- 
tention. That attitude can no longer be justified. Recent events have indicated 
that the time when conversion will be practical for large libraries may not be so 
remote after all — indeed may be only a few years away. Events contributing to 
this change have included: the steady growth of rapid memory capacity of compu- 
ters; the falling cost of computing capacity; the improvement of equipment and 
of programs for remote-terminal time sharing; the establishment of the MARC sys- 
tem to make new catalog data available in machine readable form at low cost; 
the development of regional library groups which have the potential to make 
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existing catalog data available in machine readable form at low cost through 
cooperative effort; the development of standard machine formats which will make 
data interchange possible and economical. 

So the decision on when to computerize the catalog of the very large li- 
braries may soon become a matter of tactics, rather than strategy. At this 
point, the question of how to computerize the very large catalog is in need of 
urgent attention. The natural tendency, of course, would be to create a comput 
erized catalog in the image of the existing manual card catalog, preserving all 
features of present-day catalog content and file organization. Tradition tends 
to be very strong among catalogers in large libraries. Yet tradition must be 
resisted, or at least questioned. Existing card catalogs are not necessarily 
the ultimate in human wisdom and ingenuity. Certainly some of the features in 
their design are attributable to the inherent limitations of cards and card 
drawers. There is no need to perpetuate the weaknesses of present catalogs in 
future catalogs. Before computerizing our catalogs, it would be very desirable 
for people in large libraries to take a hard look at what we would want from an 
ideal catalog, then see what sort of design in a computerized catalog would 
most closely approach that ideal. The key question is "What do we want from a 
library catalog?” One of our research projects at the Yale University Library 
is endeavoring to provide an answer to this question* 

The approach we have taken is very direct. We are trying to learn what a 
future catalog should be by studying, quantitatively, what our library patrons 
are trying, successfully or otherwise, to get out of our present catalog. This 
study is supported, in part, by the Office of Education (l) . The basic idea of 
a catalog use study is not at all new. There are quite a few such studies al- 
ready reported in the literature, mostly master's thesis projects. Unfortunate 
ly, almost none of them inspire any confidence in the results because of gross 
deficiencies in experimental design, sample size, or both. Our own study was 
carefully designed to anticipate and obviate any foreseeable criticism. It is 
a two-year study which began in late 1967 and will be completed late this year. 

Actually our study is much broader than I indicated in my introduction: 

It attempts to find out what our users want from a catalog, but it does not 
stop there. It also attempts to find out the extent to which our present card 
catalog satisfies the needs of the users. And, furthermore, it attempts to 
find out whether there are practical methods, manual or mechanized, to satisfy 
needs that are not now being met. Thus, even if we do not computerize our cat- 
alog for many years, the study should be useful in perfecting our traditional 
card catalog in the meanwhile. 

Because the study is still in progress, I am unable to give any final re- 
sults. The collection of data is more or less complete, but many of the pro- 
jected analyses of the data have not yet been accomplished. Therefore, I will 
confine myself mainly to describing how the study has been carried out and 
stating what we should be able to learn from it. I will state seme of our pre- 
liminary findings, but I must emphasize that all figures to be quoted here are 
based on incomplete data and are subject to possible revision in our final re- 
port. 

The public catalog of the Yale University Library is located in the main 
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entry hall of the Sterling Memorial library . It contains some 7 million cards, 
housed in some 7000 file drawers . It is u single-alphabet catalog. It con- 
tains full catalog card sets for the more than 3 million volumes housed in 
Sterling Memorial Library and only main- entry cards for the 2 million volumes 
housed in other libraries at Yale. Since the numerous school and departmental 
libraries have more complete catalogs for their respective collections, users 
of the main catalog are generally in search of books that are housed in the 
collection at Sterling Memorial Library. The stacks of Sterling Memorial Li- 
brary are open to all Yale faculty and students, and to a rather large number 

of authorized outside users of the library. The catalog, as you can imagine, 

takes up a rather large area, and is the scene of constant activity throughout 
the hundred hours a week that the library is normally open. 

A catalog search is basically a word- matching procedure. The searcher 
seeks to match some known clue, which i.s commonly a word or a phr-se or a name, 
against the headings in the file; if he' succeeds in finding a file item which 
matches his clue, he can expect to find seme associated information in the file 
(e.g., a call number) which is the object of his search. In a nutshell, the 

aims of our study are to find out: l) what clues the catalog users possess when 

they begin a catalog search; 2) hen/ well our present catalog responds to (i.e., 
matches) the clues that the user brings; End 3) whether the responsiveness of 
the catalog might be improved through some change(s) in catalog design. 

We are finding out what clues the users bring to their catalog searches 
through interviews with a representative sample of catalog users. The inter- 
viewees are approached at the instant that they reach for a catalog drawer to 
begin a search; they are asked a number of carefully worked out questions de- 
signed to elicit very precisely what the searcher is trying to accomplish 
through the catalog and what information he has brought to the search. We also 
collect background information about the searchers (but we do not ask for their 
names). The interviewers are all trained to follow a standard interview outline. 
At the beginning, the questions are very general and nondirective, to avoid 
leading of the subject. ("Could you. please tell me what you were about to do 
here at the catalog when I interrupted you?") Only after the subject has had 
ample opportunity to say whatever he wants to, in his own way, do the questions 
become more, direct and specific. Clues available to the searcher are recorded 
in full detail. If he carries them in' the form of a printed bibliography or as 
handwritten notes, they are photocopied by the interviewer. If he carries them 
in his mind, they are transcribed by the interviewer,, taking pains to determine 
and preserve the searcher’s personal version of the spelling of author names 
and unusual words . 

An average interview takes about ten minutes; but it may take as little as 
two minutes or more than fifteen minutes, depending on the nature of the 
searcher's problem and the amount of information which he brings to the search. 
When the interview is concluded, the subject is left alone to carry out his 
search, but is observed discreetly from a distance. The catalog drawer which 
he uses is noted. When he appears to have finished, he is approached again and 
asked if he was successful. If so, th.e interviewer notes the call number(s) of 
the item(s) which satisfied the search. Later on, we can examine the catalog 
cards for these call numbers, and we can examine the books themselves, to see 
how well the existing catalog matched, end how well it might have matched, the 
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clues which the user had when he began his search. This follow-up activity to 
examine the catalog cards and the books they represent is considerably less 
glamorous and exciting than face-to-face interviewing, but it is every bit as 
important to our study; and it actually takes more time and effort than the 
interviews. 

The interview program, concluded only this month, was conducted over a 
full calendar year. We gathered data from some 2,000 interviews. The catalog 
users were- -cooperative beyond our wildest dreams . Fewer than 1 percent of the 
people approached refused to be interviewed--generally it was because they had 
to rush off to a class. Most interview subjects were delighted to be asked 
about their activities and eager to respond to all questions. 3ecause of the 
accidents of random sampling, some people were interviewed two or three times 
during the year, and they still remained fully cooperative. To put it simply, 
the library users were very happy to learn that somebody actually cared about 
them. 



At this point, I should explain how the interviewees were selected in or- 
der to provide a representative sample. Long before we began any interviewing, 
we had already begun collecting gross statistics on observed traffic in the 
catalog area, and on various activities which occur in the catalog area. There 
happen to be five different entrances to our catalog area. By counting the 
number of people entering through each doorway at various times on different 
days, we constructed a preliminary projection of expected traffic by day of 
week and time of day. We then decided how large an interview sample we wanted 
(at least 1 percent). To get this, we worked out a precise interview schedule 
for each doorway in which the interview times and dates are in proportion to 
the expected traffic. Thus, each of our interviewers (2 full time, with a third 
available to help in emergencies) was assigned to be at a specific doorway at a 
specific hour and minute; and the first catalog user who entered through that 
doorway before a fixed interval elapsed was the person to be interviewed. Then ' 
the interviewer would go on to his or her next assignment, which would generally 
be at a different doorway , Assignments were spaced to allow reasonable time for 
completion of one interview before starting the watch for the next one. Some- 
times no one would come through the doorway during the scheduled interval and 
ec there was no interview; however, this is a random event which does not affect 
the value of the sampling technique. 

What can affect the value of the sampling technique we used is seasonal 
variation in traffic pattern. Therefore, we continued the gross traffic count- 
ing program for more than a year in order to detect such variations. Differences 
between the observed pattern and the preliminary projection on which the inter- 
view scheduling was based will be compensated by applying appropriate weighting 
factors to the results of interviews conducted at different times and times of 
the year, so. as to make the statistical results entirely representative of ob- 
served traffic. 

By now you probably have a fairly clear idea of what we have been doing. 

Now we can discuss the ultimate question: What do we expect to get out of the 

study that can do anyone some good? 

Let us start with our gross observations of traffic and other activities in 
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the catalog area. We can plot traffic by time of day, day of the -week, and time 
of the academic year, and can thus produce a clear picture of expected volume 
and variation of catalog use. This can be of immediate value to the library ad- 
ministration — particularly in planning for the provision of reference assistance, 
and in scheduling of catalog maintenance — and it can be important in helping to 
determine the peak simultaneous access capacity which must be provided in any 
future computerized catalog facility. Of course, librarians already know quite 
a lot about traffic patterns from long years of experience, so we do not expect 
any- earth-shaking- -revelations from this particular result of the study. 

Other aspects of our observation of catalog traffic are more novel. We 
have collected much information on the amount of time which users spend at the 
catalog. What proportion of users spends one minute per use, two minutes, five 
minutes, fifteen minutes, etc? From this we can tell what kind of queuing to 
expect in the -catalog area, not only with the present level of activity but with 
increased future activity as our user population grows. This should give us a 
sort of yardstick against which to measure the performance of contemplated com- 
puterized systems, to see whether they are worthy of serious consideration. We 
have collected extensive data on the number of catalog cards which users actual- 
ly look at in the course of a catalog search, and on the number cf references 
which they tend to copy from the catalog cards during a search. These data may 
or may not prove useful in furthering our understanding of the catalog user. 

We have collected data on precisely which catalog drawers were consulted by 
searchers at times when traffic was being observed. This should tell us whether 
all catalog drawers tend to be consulted equally or whether there are high-activity 
areas and low-activity areas in the' catalog. This will have an important bear- 
ing on the level of queuing to be expected in a computerized catalog for any 
given memory access arrangement. All of these results will be based on very 
simple objective observations of the catalog area — merely counting people, and 
timing people, counting their hand motions in writing down references or flipping 
cards, noting and recording catalog* drawer numbers. These measurements require 
no interviewing at all. 

The interview data will yield a wealth of potentially useful results. For 
one thing, they will add some useful details to our picture of catalog traffic. 
Since we record the academic status of persons interviewed, we will be able to 
describe separate traffic patterns for students, faculty, staff, outsiders — and 
see whether they differ significantly. We will be able to do the same for new- 
comers to the University (students or faculty), as opposed to old-timers. We 
-will be able to do the same for different departmental affiliations or areas of 
study. 

Secondly, the interview data will yield quantitative insights into what it 
is that catalog users are seeking, and will tell us whether different categories 
of users tend to bring different types of problems to the catalog. Fairly early 
in the study, it was observed that the objectives of catalog searches tend to 
fall into four rather distinct categories. One category, the "document search," is 
where the user has a specific published work in mind and is using the catalog in 
order to locate a copy of that work. A second category, imperfectly called the 
"author search," is where the user knows of a source of publication — usually but 
not necessarily an author or corporate author — and wants to find out what works 
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are available from that source (e.g., what are some books by Thomas Mann?), A 
third category is the "subject search," /here the user seeks to identify publica- 
tions on a known abstract topic. The fourth category is the "bibliographic 
search," where the user has no intention of borrowing any book, but is only in- 
terested in finding the catalog card for a known publication so that he may get 
some specific information from the catalog card itself (e.g,, to complete the 
bibliographic citation in a paper he is writing). 

The document search is by far the most common. Analysis of a portion of 
our data suggests that about 75 percent of the uses of our catalog are for the 
purpose of locating a specific known publication (which, to our surprise, is al- 
most always available in our collection). The other three use categories are 
more or less equally divided among the remaining 25 percent. 

These results are preliminary, of course. Even if they were final, they 
would be suspect, however. There is a strong possibility or presumption that 
the actions of a library user are shaped by the nature of the catalog facility 
that is available to him.- Do library users tend to accommodate themselves to 
what our catalog can do very well, such as locate known works? We are getting 
an answer to this from a very innocuous sounding but highly revealing question 
that we ask in our interviews . It reveals that a significant number of the 
document searches performed at. the catalog are really subject searches in dis- 
guise. Presumably there would be a smaller proportion of overt document searches 
if our library catalogs were b tter suited for subject searching. We hope to 
get at the question of accommodation in yet another way, by looking for any 
difference in searching patterns between newcomers to the University and old- 
timers, or between newcomers, at the beginning of the school year and later in 
the school year (when they have had a chance to adjust to reality). 

A third, and also very important, type of result expected from our inter- 
view data will be the- compilation and analysis of the search clues which cata- 
log users possess at the start of their searches. By comparing the clues with 
the information available in the retrieved catalog cards and the documents they 
represent, we can assess the accuracy of the clues... For example, we can tell 
how often the catalog users start out with author names or titles that are in- 
accurate or misspelled, and we can analyze the frequency of different types of 
inaccuracies. This is fairly important for designing card catalogs, but it 
could be crucial for computerized catalogs. Computers make no concessions to 
misspelling unless designers take great pains to program around their punctil- 
ious and unyielding accuracy. The data collected from the interview program 
can be used to test the effectiveness of computer algorithms which are intended 
to produce matches despite inaccurate input from the searcher. We have already 
made quantitative evaluations of the effectiveness of two different data com- 
pression algorithms described in the literature by testing them on real data 
from our interview program. 

last, but by no means least, we will be able to use data from the inter- 
views and from the retrieved catalog cards, and from, the works corresponding 
to those catalog cards, to seek means to improve the quality and efficiency of ; 
cataloging rules and catalog structure. We will be able to s tv whether there 
are categories of data included on cards which are rarely wanted, or categories j 
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< which are frequently wanted but rarely Included. We will be able to throw some 
light on the wisdom of dividing a catalog into sections segregated by date of 
publication or by other unconventional distinctions. We should learn whether 
machine-like subject indexing which makes use of the key words occurring in 
book titles, or prefaces, or chapter headings, or indexes, etc., would match 
actual user clues as well as our conventional subject indexing (based on 
authority lists) does now. Or whether it would be even better. 

Of course, we are only studying one library 'at one university. Will our 
results be useful to people outside of Yale? We believe that they will be; 
but I would caution in advance against blind acceptance of any of our results 
as universally relevant. There are bound to be local differences among li- 
braries and universities. To find out how significant these differences can 
be, it would be prudent to conduct studies similar to ours at a considerable 
number of large libraries of different kinds. I was very gratified to learn 
recently that a study of this type will soon be undertaken at the Library of 
Congress. But more studies are needed. I hope that they will not be long in 
coming. After all, the computers are nearly upon us. With all the effort 
that has been going into research and development work on how to computerize 
catalogs, it would be nice to have more guidance on how to do it right. 
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Abstract 

F. G. Kllgour's truncation alqorlthm forma- 
chine retrieval from larqe blbl loqraphlc files (1) 
was tested for performance In matchlnq user-sup- 
plied, unedited search clues to blbl loqraphlc 
data contained In a library cataloq. Kllgour had 
previously tested the alqorlthm to Identify dupli- 
cate book orders In the In-process list of Yale 
University Library, and found recall to be about 
90%. We have now tested the algorithm, by manual 
simulation, on data derived from 126 case studies 
of actual searches of the cataloq at Yale Univer- 
sity Library. The alaorlthm achieved 70% recall 
when compared to results of conventional manual 
searching. Precision was not determined. 



Frederick G. Kilgour (1) has proposed an al- 
gorithm for machine retrieval of blbl ioqraohic 
entries from very larqe files, including library 
catalogs. The algorithm Is designed to cope with 
misspellings and other discrepancies in the user's 
Input when searching a file that contains entries 
of high editorial quality. The alqorlthm trun- 
cates and matches the user's version and the 
file's version of author-and-tltle data In a bib- 
liographic entry. Kllqour reported on a test of 
his alqorlthm In which It was used to check for 
duplicate book orders in a 20,000-entry in-oro- 
cess (acquisition) list at Yale University Li- 
brary. In this paper we report on a test of his 
algorithm as applied to a library cataloq, rather 
than an In-process list. 

The opportunity to test Kllqour's method 
when applied to retrieval from a library cataloq 
was provided by the ready availability of data 
derived from a current study (2) of cataloq use at 
Sterling Memorial Library (3.5 million books) at 
Yale University. This study collects, from a 
rigidly randomized sample of cataloq users, pre- 
cise Information on the clues available to them 
at the moment of Initiating a search. Search 
clues are recorded exactly as known to the cata- 
log user, employing his own spel 1 inq--r1qht or 
wrong. For each cataloq user studied, the out- 
come of the search Is ascertained; complete cata- 
log Information Is recorded fo,* documents Identi- 
fied as pertinent In successful searches. 

In our test, search clues known to cataloq 
users who seek specific documents, and cataloq 
data corresponding to documents Identified by 
these users, were truncated and matched, by manual 
simulation, according to Kllgour's algorithm. We 



were thus able to test its recall performance with 
real cataloq searches. A test of the method's 
precision was not immediately feasible, because 
It would require comparison of input data with the 
entire cataloq or a substantial portion of it. 
However, It Is felt that the determination of re- 
call performance should at least indicate whether 
the method shows sufficient promise in cataloq 
searching to warrant evaluation of its precision 
In such an application. 

Data used In our evaluation came from 126 
searches In which the cataloq user was successful 
In locatlnq the specific document he was seeking. 
The two most successful versions of Kllqour's 
truncation alqorithm were tested, those with for- 
mulae 3-3-1 and 5-5-1 (where the three figures 
stand for the number of Initial characters to be 
retained from the author's last name, the title's 
first word, and the title's second word). Both 
user data and cataloq data were truncated; where 
truncated versions matched, the entry was consid- 
ered retrieved. 

It should be noted that certain allowances 
which favored the alqorlthm were made in our test. 
Kilqour applied his method to only those entries 
in the file having a personal or corporate name 
main entry, thus excludlnq title main entries. 

Some title main entries were Included In our sam- 
ple cf 126 cataloq searches, and all but two were 
considered retrieved, since the user's clue corre- 
sponded perfectly to cataloq data; thus any algo- 
rithm would have retrieved them. In two title 
main entries the user's clue did not match per- 
fectly, so we eliminated them from our test, re- 
ducing the sample to 124. Further, In our test, 
all cases where a user had Information on any 
name entry (not /lust the main entry) In the cata- 
loq, that Information was considered as though It 
were a main entry. Thus a user's clue which 
matched onlv a joint author and title was still 
considered retrieved by us, althouqh In Kllgour's 
test It could not have been, since his test was 
performed on a single-entry file. Finally, where 
the onlv difference was one of punctuation, or 
where there was a difference because translated 
or transliterated data were supplied by the user, 
full credit was given and the item was considered 
retrieved. 

In his test on the 20,000-entry in-process 
list, Kilqour found that his algorithm produced a 
precision of 97.3%; that Is, 97.3% of the "dupli- 
cate" references retrieved by the alqorlthm were 
indeed duplicates. (It should be noted, lowever. 
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Table I 



Results of Applying Kllgour 's Method In Cases 
Where User's Clues and Catalog Data 
Did Not Match Completely 



Imperfections 

in 

User's Clues 

Neither author 
nor title 

Author's last 
name, no title 

Title, no 
author 

Wrong author 

Misspelled 

author 

Wrong words 
in title 

Misspelled words 
in title 

Transposed words 
in title 

Incomplete title: 



Documents 

Retrieved 

Method Method Documents 
3-3-1 5-5-1 Not Retrieved* 



2 

9 

3 

1 

2 6 ( 8 ) 

2 2 5 

3 1 1 (3) 

1 1 



words, etc. (see Table I)}. Algorithm 3-3-1 was 
able to "heal" discrepancies and retrieve the Item 
In 12 cases; algorithm 5-5-1 In 8 cases. On the 
other hand, human beings were able to heal these 
discrepancies In a far qreater number of Instances, 
namely 48. 

The viewpoint from which we determined a re- 
calibrate of 70% should be clearly understood. We 
are considering real document searches that happen 
to have been concluded successfully In an actual 
library with a manual catalog and we have deter- 
mined the proportion of these searches which would 
be concluded successfully In a hypothetical, com- 
puterized library where the only means of search- 
ing the catalog would be by Kllqour's method. In 
a real library with a manual catalog, wanted docu- 
ments can be located many ways, not merely through 
a knowledge of author and title (e.g., through sub- 
ject entries, series entries, cross references). 

We do not disqualify any manual approach from con- 
sideration. We are comparing the real world with 
a specific potential alternative. Obviously, the 
use of Kilgour's method In combination with other 
computer programs could result in a recall rate 
higher than 70% by our method of calculation, and 
conceivably higher than 100% (because some docu- 
ment searches on manual cataloqs that now end In 
failure might become successful using new search 
methods ) . 



a. First word correct 4 4 

b. First word In- 

correct 4 

Entire subtitle, 

no title 1 

Part of subtitle: 

a. First word correct 1 

b. First word in- 

correct 3 



ft Is interesting to note that our results 
presented here very closely correspond to another 
similar simulation test we conducted (3) on Fred- 
erick H. Ruecklnq's method (4), where we found re- 
call performance to be also about 70%, compared to 
his report of 90%. Ruecking matched unedited, 
user-supplied purchase requests aqainst a MARC I 
tape according to an Inqenlous word-compression al- 
gorithm. Kllgour matched entries of a larqely un- 
edited in-process list aqainst each other, accord- 
ing to an eleqantlv simple truncation algorithm. 



Total documents 12 8 36 (40) 



♦Numbers In parentheses apply to Items not re- 
trieved by 5-5-1; all other numbers In this col- 
umn apply to items not retrieved by either method. 



that precision performance is In patrt a function 
of file size, and would be expected to drop off 
when applied to much larger files.) Kilqour test- 
ed recall by visual Inspection of a sample of the 
In-process list, and estimated It to be between 
85% and 93%. In other words, about 90% of all real 
duplicates were retrieved from the file by his al- 
gorl thm. 

Results of our test showed that of the 124 
documents which were located successfully by manu- 
al search In the existing card catalog, 88 were re- 
trieved by algorithm 3-3-1 and 84 by algorithm 
5-5-1, amounting to recall rates of 70.9% and 
66.1%, respectively. In 76 out of our sample of 
124 searches, the user's author-and-tl tie Informa- 
tion matched corresponding catalog data character 
by character, while In 48 cases there were some 
discrepancies (misspellings, missing and wrong 



We would caution readers against assuming 
that the same algorithm Is likely to be eaually 
effective In solvinq problems associated with ac- 
quisitions control and library catalogs. The dif- 
ferences between our results and those reported by 
Kllgour and Ruecking demonstrate that the situa- 
tions with reqard to catalog use and acquisitions 
control are very different, and that tests made In 
one situation cannot be regarded as very reliable 
to predict what will happen In the other. It ap- 
pears that user clues vary slqniflcantly with dif- 
ferent types of application; those brought to the 
catalog and those supplied on purchase requests 
have little In common. The name of the author and 
the title seem to be much more consistently (and 
correctly) supplied by users on purchase requests 
than In catalog searches. 

On the basis of our tests, It Is difficult 
to regard machine retrieval by means of word com- 
pression or truncation algorithms as a satisfac- 
tory substitute for conventional manual searching 
of library cataloqs. However, improvement in the 
performance of machine techniques might be expected 
from modification of such algorithms or from the 
use of combinations of algorithms for retrieval. 
Further work on such approaches is highly desir- 
able. 



O 
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PERFORMANCE OF RUECKING’S WORD-COMPRESSION 
METHOD WHEN APPLIED TO MACHINE RETRIEVAL 
FROM A LIBRARY CATALOG 



Ben-Ami LIPETZ, Peter STANGL, and Kathryn F. TAYLOR: 

Research Department, Yale University Library, New Haven, Connecticut 



F. H. Ruecking s word-compression algorithm for retrieval of bibliographic 
data from computer stores was tested for performance in matching user- 
supplied, unedited bibliographic data to the bibliographic data contained 
in a library catalog. The algorithm was tested by manual simulation, using 
data derived from 126 case studies of successful manual searches of the 
card catalog at Sterling Memorial Library, Yale University. The algorithm 
achieved 70% recall in Cimparison to conventional searching. Its accepta- 
bility as a substitute for conventional catalog searching methods is ques- 
tioned unless recall performance can be improved, either by use of the 
algorithm alone or in combination with other algorithms. 



Frederick H. Ruecking has published a report (1) of a method for 
improving bibliographic retrieval from computerized files when searching 
on unverified input data supplied by requestors. The method involves 
compression of author-and-tide information before comparison. The rules 
for compression cause certain types of spelling errors and word discrep- 
ancies to be ignored by the computer. Ruecking reported 90.4% recall 
and 98.67% accuracy (precision) in a test of his method in which un- 
verified book order requests were matched against a MARC I data base 
that contained 1392 of the references searched. This paper reports on >. 
small-scale manual simulation test undertaken to assess the value of the 
method when applied to bibliographic retrieval from a library catalog. 
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The opportunity to test Ruecking’s method when applied to retrieval 
from a library catalog was provided by the ready availability of data 
derived from a current study (2) of catalog use at Sterling Memorial Li- 
brary (3.5 million books) at Yale University. This study collects, from a 
rigidly randomized sample of catalog users, precise information on the 
clues available ic them at the moment of initiating a search. Search clues 
are recorded exactly as known to the catalog user, employing his own 
spelling — right or wrong,, For each catalog user studied, tho outcome of 
the search is ascertained; complete catalog information is recorded for 
documents identified as; pertinent in successful searches. Search clues 
known to catalog users who seek specific documents correspond to the 
“unverified input data” which Ruecking’s: method would match against 
catalog holdings. Catalog information on those documents identified as 
pertinent corresponds to ithe portion of the data base that Ruecking’s pro- 
gram seeks to match. It was; possible, (therefore, to apply Ruecking’s 
method by manual simulation, and to testt its recall performance in real 
catalog searches. A test of its precision was not immediately feasible be- 
cause such a test would require comparison of input data with the entire 
catalog (or a substantial portion of it). However, the determination of 
recall performance would ait least indicate whether the method shows 
sufficient promise in catalog searching to warrant evaluation of its preci- 
sion. 

An aside on precision is in order, however. It should be noted that 
precision of retrieval with a given, method! tends to vary inversely with 
the size of the file being searched. Although Ruecking did not specify 
the number of records included ini his MARC I data base, it could not 
have exceeded 48,000. Had he run. his test on a data base, ten, or fifty, 
or one hundred times burger, the measured precision would certainly 
have been much lower than the figure reported. Any librarian who is 
contemplating the adoption of a retrieval technique which has been tested 
on a data base similar to, but smaller than, his own should realize that 
precision performance must inevitably drop as the data base is increased. 
The degree of lowered precision tc be expected may be predicted theo- 
retically or estimated from tests on files of several different sizes. 

The data used in the evaluation of recall performance reported in this 
paper came from 126 searches :in which the catalog users had been suc- 
cessful in locating the specific documents that they were seeking. The 
compression coding method described by Ruecking was applied in each 
instance to the author-title search clues supplied by the catalog user and 
to the author-title information available on the catalog card. Threshold 
values were computed for the catalog card data, and retrieval values were 
computed for the user data. Vlfiien the retrieval value was at least as large 
as the threshold value, the document was considered “retrieved.” 

Ruecking’s method was designed for use with English-language titles 
only. Of the 126 catalog searches in the study sample, 20 involved foreign- 
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language titles. Recall was determined on both the full sample and the 
English-language subset of 106 searches. Surprisingly, there is not a great 
improvement in performance when foreign-language references are ex- 
cluded. 

It should be noted that several difficulties were encountered in applying 
Ruecking’s method because of ambiguities in the rules stated in his paper. 
In fact, in his Figure 2 (page 236), of the seventeen illustrations or com- 
pression-coded data retrieved by his program, at least eight appear to 
contain departures from the compression-coding rules as stated in the 
paper. His Table 5 (page 235) is scantily described: “Individual Code 
Test” and “Full-Code Test” are not defined; neither are column headings. 
And, contrary to the text (page 234), values in columns five through 
seven are obtained by adding two to the calculated thresholds in only the 
top half of Table 5; in the bottom half, no such regular correlation exists. 
In all cases of ambiguity, the alternative was selected that would tend 
to increase probability of retrieval. For example, Ruecking states (page 
234) that the search program provided for matching of titles on the basis 
of rearrangement of title words, and that the threshold value required 
for retrieval is raised at the same time. Raising this value decreases the 
probability of retrieval, but it >s not clear by how much the value is to 
be raised. For purposes of the test, the threshold value was not raised 
at all in cases where title words were out of correct sequence, thus re- 
taining maximum probability of retrieval based on the number of matched 
words alone, regardless of their sequence. 

Results of the test showed that, of the 126 documents in the full sample 
which were located successfully by manual search iin the existing card 
catalog, only 88 were retrieved by fixe compression-code method — a recall 
rate of 70%. Considering only file 106 English-language references, 77 
were retrieved by the compression-code method — a recall rate of 73%. 

The premise for the preceding calculation of recall rate should be clearly 
understood. The test considered real document searches that were con- 
cluded successfully in an actual library using a manual catalog; recall is 
defined here as the proportion of such searches that would be concluded 
successfully in a hypothetical, computerized library where the only means 
of searching the catalog would be by Ruecldng s method. In a real library 
with a manual catalog, wanted documents can be located in many ways, 
not merely through a knowledge of author and title (e.g., through subject 
entries, series entries, cross references). The test did not disqualify any 
manual approaches from consideration; it compared the real world with 
a specific potential alternative. Obviously, the use of Ruecking’s method 
in combination with other computer programs could result in a recall rate 
higher than 70% or 73% by the method of calculation employed, and 
conceivably higher than 100% (because some document searches of man- 
ual catalogs that now end in failure might become successful using new 
search methods). 
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Table 1 provides detailed information on the discrepancies between 
user data and catalog data in the test. With respect to the full sample 
(126 documents), there were 49 documents for which mismatches of data 
were observed. Of these, the compression-code method was able to “heal” 
mismatches in 11 instances to cause retrieval; on the other hand, manual 
searches had achieved retrieval in all 49 instances. With respect to the 
English-language sample (106 documents), there were 37 documents for 
which mismatches of data were observed. Of these, the compression- 
code method was able to “heal'' mismatches in 8 instances to cause re- 
trieval; on the other hand, manual searches had achieved retrieval in 
all 37 instances. 

Contrary to expectations, the compression-code method performed 
somewhat worse, or at least no better, in “healing” actual mismatches in 
English references (8 out of 37) than it did with foreign-language refer- 
ences (3 out of 12). The higher overall recall percentage with the English- 



Table 1. Results of Applying Rueckings Method in Cases where User 
Clues and Catalog Data Did not Match Completely 

Full Sample English Subset 
(126 documents) (106 documents ) 
Not Not 

Type of Mismatch in User Data Retrieved Retrieved Retrieved Retrieved 



Had neither author nor title 
Had author's last name, no title 
Had title, no author 1 

Had wrong author 

Had misspelled author 4 

Had wrong words in title 1 

Had misspelled words in title 2 

Had words transposed in title 2 

Had incomplete title: 

a. First word correct 2 

b. First word incorrect 
Had entire subtitle, no title 
Had part of subtitle 

a. First word correct 

b. First word incorrect 

Total documents®®® 11 



2 

9 

2 

1 

4 

9® 

2 



5*® 

6 

1 



1 

2 



2 

1 

1 

2 



1 

5 
2 
1 
1 

6 
2 



5®® 

5 

1 



1 

2 



38 



29 



®1 case of correct word stems not matched because of wrong endings. 

•®2 cases of long or composite titles with maximum threshold values 
contained in input words but not among the first four significant 
words. 

°°* Figures shown are lower than totals of figures in columns because 
some documents had two or more types of mismatch. 
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language subset is attributable entirely to the fact that users had com- 
plete and correct data more frequently for English references (69 out of 
106) than they did for foreign-language references (8 out of 20). Thus, 
regardless of original intent, the method words equally well (or equally 
poorly, depending on one’s viewpoint) on foreign-language and English 
references. If foreign-language references had been systematically ignored 
in applying die test to catalog searches, some 16% (20 out of 126) of 
the searches would have been excluded, with no real gain in performar ce. 

The block of interviews from which the searches used in this test were 
drawn included 10 unsuccessful document searches in addition to the 126 
successful searches. One could speculate on whether the compression- 
code method would have been able to “heal” these failures, resulting in 
a higher performance rating. The indications are, however, that the 
chances or such healing are close to zero- In a majority of these unsuc- 
cessful searches, the available data were incomplete or wei t ; not of the 
type thai the method is intended to utilize. In the few remaining cases, 
it is very likely that the searches were unsuccessful simply because the 
desired documents were not in the library collection. 

Recall performance as measured by the test could have been improved 
by modifying Ruecking’s rules to some extent. For example, five more 
titles would have been retrieved had the assigned retrieval value been 
increased by two units in cases where the first title word matched cor- 
rectly; this would have increased overall recall performance from 70% 
to 74%. A further increase to 76% would have resulted from matching 
the user’s version of the title with the catalog’s subtitle, or with portions 
of titles which follow a punctuation mark (in addition to matching with 
the actual title in the catalog). 

Extersion of the compression code to include publisher and date as 
well as author and title would do little or nothing to improve the per- 
formance of this method. The test data, although admittedly a small sam- 
ple, indicate that users who do not have accurate author and title informa- 
tion when they begin a search very rarely have accurate information on 
any other descriptive data element. 

It is, of course, a matter for individual judgment as to whether the 
performance of the compression-code method, as indicated by the test 
reported here, is sufficiently good to make it attractive for use in some 
computerized alternative to the manual library catalog. In the authors’ 
opinion, Ruecking’s method does not in itself supply an adequate solution 
to the problem of searching a computerized catalog. However, further 
investigation seems warranted along two lines. First, the method might 
be modified to give better performance in this application. Second, it 
might be used in combination with some other computer methods to give 
searching performance approaching that which is attained today by the 
manual searching of card catalogs. 
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APPENDIX F. 



FORMS USED FOR INTERVIEWING AND FOLLOW UP 










INTERVIEW SCHEDULE 


Number: 




Interviewer 


Date 


Time 


Door 


Time began 


Door done 



Pardon me, Sir, (Madam), we are conducting a study of the use of the card cataloq, and I would like to ask vou a 
few questions if I may. 



1.1 Have we interviewed you here at the cataloq before? 

[ ] yes [ ] no 

If yes, number of times 

2.1 Could you please tell me precisely what you were about to do here at the cataloq the moment I interrupted 
you? 



2.2 Do you have anything written down, anything to guide you In your current search? 

[ ] yes [ ] no 

If yes, copy obtained? 

[ ] yes [ ] no 

2.3 Do you know anything about this work in addition to what's written down? 



2.4 Suppose that I would offer to do this search for you; how would you instruct me to ensure that I would do 
it right? 

A. [ ] Please show me just what you are doinq, as though you were showing a new assistant whom you are 

training to do this. 

B. [ ] We need specific data on what people are looking for in the cataloq, so we can compare this with 

what the catalog contains. 

C. [ ] Are you looking for a specific publication you know about? 

D. [ ] Are you looking for information on a topic, rather than a snecific publication? 

E. [ ] Are you just trying to find out something about a specific publication? 



[ ] Document search 
[ ] Subject search 
[ ] "Author" search 
[ 1 Bibliographic search 

ERJC 



Document and Bib! lographlc search Followup 



Note: for bibliographic search skip questions 2.3 and 2.4. 

When finished with catalog drawer, 

1. Have you found the item? 

[ 3 yes [ ] no 

2.1 If yes * call no. : 

2.21 Under what did you find it? 



2.22 Are you satisfied that this is the item you were looking for? 
[ ] yes C ] no 

Comment : 



2.3 Did you pick out something in addition to what vou were looking for (that is, something in the same 
connection) ? 

[ ] yes [ ] no 

2.4 If yes , call no. : 



How did you come across it? 



3.1 If no, what did you look under? 



3.2 What are you going to do next? 



Time ended: : a.m. p.m. 




BACKGROUND 



1. In what connection do you need this work? 

Student 

j course work 
| paper 

| undergraduate thesis 
’ research project 
| orals 
’ ' dissertation 
| 3 interest 
. ] exam 
I ] other 



Non-student 

lecture preparation 
publication 
research project 
] looking for ideas 
] professional reading 
] interest 
] other 



2. How or where did you find out about this work? 

[ someone mentioned it 
[ ‘ someone wrote it down 
[ was in footnote or 
bibliography in book 
! found in reference book 
’ ] course list 
’ j saw work i tself 
| ‘ own list 
! owns work itself 
j ‘ has always known 
! other 



[ ] remembered it 
[ ] hand-copied it 
[ ] machine-copied it 
[ ] have original source 



1.1 Affiliation: 



Statisti cal data 



[ ] Undergraduate 

11 [ ] Freshman 

1 2 [ ] Sophomore 
23 [ 3 Junior 

2d [ 3 Senior 

3 [ ] Graduate student year 



4 ] Faculty 

5 ] Staff 

f> ] Non-Yale student 

7 3 Mon-Yale faculty 

8 ] Faculty or student wife 

9 [ ] Other 



1.2 How many years have you been using this library? years 

2.1 If not a freshman or sophomore: What department are you with? 

or 

What is your major? 

2.2 If faculty or staff: What is your title or the name of your position? 



3. Sex: 1 [ ] Male 2 [ ] Female 

4. How often do you use the Catalog at Sterlinq Library: 

[ several times a day 
[ 3 daily 

[ 3-4 times a week (every other day) 

[) 2 times a week (every 3-4 days) 

[ weekly 

[ ] 2-3 times a month 
[ 3 monthly 
[ ] less frequently 

5*1 Which other Yale libraries do you use regularly, if any? r n 

v L J none 



5.2 Do you use these just to study in, or do you use their collections? 



Libraries : 



6. Interview ended: 



Study 

t ] 

[ ] 
t ] 



Use collections 

t ] 

[ ] 

[ ] 




DOCUMENT SEARCH 



2,1 Please describe the item to me as fullv as Possible. 

A. [ ] If you had something written down, what would it sav? 

B. [ ] If ! offered to qet this item for von, what, more would vou tell mo to enable me to find it? 



2.2 Is that a book or a periodical? If neriodical, is reference to 

[ ] book [ ] periodical [ ] article [ ] title onlv 

? ,21 If book, is reference to the whole work? 

[ ] whole work [ ] section 

2.22 If whole work, do you need the work as a whole, or do you just need to use some specific oart of it? 
[ ] whole work [ ] section 
Which specific part? 



2.3 Have you had any previous contact with this publication? 

[ ] yes [ ] no 

If serial, do you read this journal regularly, or just an occasional article? 
[ ] regularly [ ] article 
2.31 If yes, was that the Library's copy? 

[ ] yes [ ] no . 

2.4 If yes, what do you remember about the physical appearance of the publication? 



2.5 Under what are you going to look in the cataloo? 



2.6 

1 . 

1.1 



1.2 



How many items would you like to obtain on this occasion? 


Post from P. 
P.l 

If monograph 


1 and 
Notes 


“notes" 


[ ] 


'[ ] 


author or editor 


[ ] 


t ] 


ti tie 


[ ] 


[ ] 


subti tie 


[ ] 


[ ] 


edition and date of publication 


c ] 


[ j 


translation from 


[ i 


[ j 


pub! isher 


[ i 


t j 


place of publication 


[ j 


[ ] 


series 


[ ] 


[ ] 


sponsoring organization 


[ ] 


t ] 


size, paging, color, binding 


c ] 


c ] 


i 1 1 ustrations 


[ ] 


[ ] 


bi bl iogranh.y 


[ j 

Jf serial : 


C ] 


anything else 




[ ] 
r j 
[ ] 
[ ] 
[ ] 
[ ] 
[ ] 
[ ] 
[ ] 
[ i 
[ ] 
[ ] 
[ ] 



[ ] title of journal 

[ ] subtitle nf journal 

[ ] editor 

[ ] sponsoring organization 

[ ] publisher 

[ ] place 

[ ] starting date and frequency 

[ ] change in title 

[ ] size and appearance 

[ ] author of article 

[ ] title of article 

[ ] volume, issue number, date 

[ ] anything else 



3. 



For any category not covered, ask 
"Do you know the ?" 



Interview n umber: 



Note: 



i i 



o 

ERIC 



SUDJECT SEARCH 



If it is apparent early in the interview that final choice of items 
will be made outside the catalog, skip questions 2.6 and 2.7. 



1.1 What is the subject on which you are searching material? 



1.2 How would you describe the topic, if I were to offer to do the 
search for you? 



1.3 Do you think you would want to tell me more, so that I would not 
pick out things you don’t want? 



1.4 Do you think you would want to tell more, so that I would not 
o verloo k anything? 



2.1 liow do you plan to find material on your topic? 



2.2 O.K., under what are you going to look in the catalog? 



2.3 Are you planning to look under anything else in the catalog? 



2.4 And if that doesn’t produce enough, are you then going to look 
under anything else? 



2.5 • For each heading given, ask "You mean, just like that?", or equiva- 
lent, to get an accurate form of the heading. Give numbers to 
headings elicited. 



2,6 Supposing again, that 1 would do this search for you. When I look 
under these headings, is there anything I should look out for, so 
I wouldn’t pick out. things you don’t really want? 



2.7 When you look under these headings, will an; of the following 
influence your choice? 

[ ] author or editor 
[ ] title (as description 
of topic) 

[ ] date of publication 
[ J language 

{ ] author or sponsoring 
organization 
[ ] contents note 
[ ] ca3.1 number 

[ ] illustrations 
[ ] bibliography 
[ ] anything else 



SUBJECT SEARCH continued 



3. Approximately how many items do you wish to obtain? 



4.1 When you are finished with the catalog here, do you think you will 
know exactly which publications you want to obtain, or will you 
need to look or ask anywhere else befor you make your final choice 

[ ] will know 

[ ] look elsewhere Where? 



4.2 Why? 

[ ] books not there Other: 

[ ] can't tell from catalog 



4.3 What are you going to base your final choice of publications on? 



4.4 (When you look in the stacks) Will any of the follovjing influence 
your choice? 

[ ] author or editor 
[ ] title (as description 
of topic) 

[ ] date of publication 
[ ] place of publication 
[ ] language 

[ ] author or sponsoring 
organization 
[ ] table of contents 
[ ] index 
[ ] preface 
[ ] illustrations 
[ ] bibliography 
[ ] size and appearance 
[ ] anything else not covered? 



Request followup. 



ERjt 



AUTHOR SKAKCtl 



Interview 
number : 



[ ] 1.1 Please describe your current search to me as accurately as possible. 



1.2 Supposing that I were to offer to do the search for yon, would you tell me 
anything more, so that I wouldn’t pick out anything you don’t want? 



1.3 Would you tell me anything more, so that I would not overlook anything? 



2.1 How do you plan to find the mate>'ial you want? 



2.2 O.K., under what are you going to look in the catalog? 



2.3 (If it applies) Are you planning to look under anything else? 



2. A When you look in the catalog, 
choice? (ask all that applies) 

[ ] author or editor 
[ ) title (as description 
of topic) 

[ } subject headings 
[ ] edition and date 
of publication 
[ ] place of publication 
l ] language 



would any of the following influence your 

[ ] publisher 
[ ] series 

( ] sponsoring organization 
[ } size and paging 
[ ] illustrations 
[ ] bibliography 
[ •] anything else 



3. Approximately how many items do you wish to obtain? 



4.1 When you are finished with the catalog here, do you think you will know 

exactly which publications you want to obtain, or will you need to look or 
ask anywhere else before you make your final choice? 

[ ] will know 

[ ] look elsewhere Where? 



4,2 Why? 

[ ] books not there 

[ ] can’t tell from catalog other: 



4.3 What are you going to base your final choice of publications on? 



4.4 (Whan looking in stacks) Will any o 

[ ] author or editor 
[ ] title (as description 
of topic) 

[ ) edition and date of 
publication 

[ ) place of publication 
[ ) publisher 
[ j language 



f the following influence your choice? 
[ ] series 

[ ] sponsoring organization 
[ ] size 

[ ] ilJ.ustrat ions 
[ ] bibliography 
[ ] Table of contents 
[ ] index 
[ ) preface 



o 

ERIC 



S^bJ'Gct aiul Au thor sea rch Followun 



If it was indicated that choice was going to be made at catalog, approach user 
as he ib leaving caidlug «*.eu. 

1, Did you find your material? 

[ ] yes [ ] no 

2*1 I f yes, call numbers: 

2.2 Under what did you find these? 

2.21 (If not the intended headings) What made you look there? 

2.3 Arc you satisfied that these are the works you need, will you take them? 

2. A Is this enough, are you now finished? 

If 2.3 and 2.4 indicate that search is not over , ..request followup* 

3.1 If no , did you find the headings you were looking for? 

[ ) Found nothing 

[ ] Found: [ ] Not found: 

3.2 (If headings found are not the intended headings) VJhat made you look there? 

3.3 What was wrong with the material under those headings? How did you decide 
they were not appropriate? 

3.4 What are you going to do next? 

If indicated, request followup. 



On r etur n from sta cks . 

1.1 How far did you go at the catalog? 

Get call numbers or portions obtained at /Catalog: 

1.2 Under what did you find these? 

2.1 What are your final choices? 

Call numbers: 

2.2 How did you choosu these? 



o 

ERLC 



Interview number: 



MI1LI0GRAP1IIC SEARCH 



[ ] 

2.1 Please describe the item to me a s fully as possible. 

A. [ } If you had something written dov;n, what would it say? 

B. [ ] If I offered to find this item for you* what more could you tell me 

to enable me to find it? 



2.2 Is that a book or a periodical? 

[ ] book [ ] periodical 

2.3 Exactly what do you need to find out about this publication? 

[ ] verification Comment, details: 

[ ] specific fact 
[ ] other 



2.4 Have you had any previous contact with this publication? 

[ ] yes [ ] no 

2.41 If yes, was that the Library’s copy? 

[ 3 yes [ J no 

2.5 If yes, what what do you remember about the physical appearance of the publication? 



2.6 Under what arc you going to look in the catalog? 



o 
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2.7 lJow many items are you looking up on this accasion? 

1. Post from p_. l._an_d Notes 3. For any category not covered, ask: 



1.2 



p.l notes 

[f monograph: 

[ ] [ ] author or editor 

[ 3 [ 3 title 

[ ] [ ] subtitle 

[ ] [ ] edition and date of publ. 


’’Do you know the ?" 


[ 3 


[ 


} Translation from, lanfg. 




t 3 


[ 


] publisher 




I 3 


[ 


] place of publ. 




.LI 


[ 


] series 




1 3 


1 


] sponsoring org. 




[ 3 


[ 


] size, paging, color. 








binding 




1 3 


[ 


] illustrations 




t 3 


[ 


] bibliography 




l ] 


[ 


] anything else 




If serial: 






[ 3 


[ 


) title of journal 




[ 3 


[ 


] subtitle of journal 




[ 3 


E 


] editor 




.LI. . 


L 


sponsoring org. 




[ 3 


[ 


] publisher 




[ 3 


[ 


] place of publ. 




l 3 


l 


] starting date and 








frequency 




jLI 


L ] change in title 




[ 3 


i 


] size and appearance 




[ 3 


i 


■] author of article 




[ 3 


[ 


] title of article 




[ 3 


[ 


] vol ■ , issue no,, and 








date 




[ 3 


[ 


] anything else 
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SJA.ck areas _cnp§nN^T. catalog^ 



AREA : 


FOUND UNDER: 


AUTHOR 


TITLE 


SERIES 


SUBJECT (SPECIFY) 








— 


.... 






















J 



CALL til 



BOOKS CHOSEN BY BROW SING IN STA CKS 
CHOSEN .OU..TllE..BASIS..QF_ INFOR>JAT.IO.N._.QN ... 

i * 

TITLE PAGE 



ERiC 



i 



PREFACE 



: TABLE OF 
1 CONTENTS .. 

I 

i 



INTRO- 

DUCTION 



OTHER REASONS 
(SPECIFY) 



APPENDIX F 



FACTORS COMPARED IN COVARIANCE ANALYSIS 




Fact ors Compared in Covari a nce Analysis 

1. Day of the week 

2. Time of day (half hours) 

3 Season of the academic calendar 
4. Entryway to catalog area 

3. Reason for an interview not being done as scheduled 

6. Previous interviews undergone by person interviewed 

7. Interviewer 

8. Duration of interview (minutes) 

9. Promptness of follow-up on search results (immediately after catalog 
search or deferred pending further user activity.) 

10. Completeness of follow-up 

11. Issuance and return of questionnaires for follow-up 

12. Academic status of person interviewed 

13. Departmental affiliation or major subject of person interviewed 

14. Years of experience with Yale University libraries 

15. Sex 

16. Frequency of past use of the catalog 

17. Number of other Yale libraries used by person interviewed 

18. Types of other Yale libraries used 

19. Number of items desired in this use of the catalog 

20. Type of catalog search (document, subject, author, bibliographic) 

21. Underlying intent of document search (document, subject) 

22. Connection in which material is needed (course work, paper, thesis, 
project, personal, etc.) 

23. Source of reference to a desired document 




F-l 



24. Type of reference to a desired document (remembered, hand copied, 
duplicated) 

25* Type of document desired (monograph, periodical) 

26* Nature of previous contact with document 
27* Intended search approach 
28* Language of desired document 

29. Is a translation involved? 

30. Personal author clue (availability and accuracy) 

31* Corporate author clue (availability and accuracy) 

32. Title clue for which catalog contains an entry (availability and accuracy) 

33. Title clue for which catalog contains no entry (availability and accuracy) 
34* Added entry clues (availability and accuracy) 

35. Date clues (type specificity and whether "known" or guessed) 

36* Accuracy of date clues 

37. Actual document dates (in ranges of years) 

38. Other search clues (availability and accuracy) 

39. Result of search (nothing found, desired material only, desired material 
plus additional material, additional material only) 

40. Was intended search entry successful? (yes, no) 

41. Type of heading under which desired material was located if different 
from intended approach. 

42. Number of entries looked up 

43. Number of useful call numbers found in addition to desired material 

44. User intent following unsuccessful search (abandon, continue within 
Yale library system, continue elsewhere) 
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